Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Nobody really understands git" is the truest part of that. While hyperbolic, it really has a lot of truth.

It's always a bit frustrating when working with a team because everyone understands a different part of git and has slightly different ideas of how things should be done. I still routinely have to explain to others what a rebase is and others have to routinely explain to me what a blob really is.

In a team of the most moderate size, teaching and learning git from each other is a regular task.

People say git is simple underneath, and if you just learn its internal model, you can ignore its complex default UI. I disagree. Even just learning its internal model leads to surprises all the time, like the blobs that I keep forgetting why aren't they just called files.



The day I got over what I feel was the end of the steep part of the learning curve, everything made so much sense. Everything became easy to do. I've never been confused or unsure of what was going on in git since.

What git needs is a chair lift up that hill. A way to easily get people there. But I have no idea what that would look like. Lots of people try, few do very well at it.


The whole point about abstractions is you shouldn't need to understand the internals to use them. If the best defense of git is "once you understand the inner workings, it's so clear" then it is by definition a poor abstraction.


Who said it's supposed to be an abstraction? The point, theoretically, of something like Git is that the actual unvarnished model is clear enough that you don't need an abstraction. The problem IMO is that the commands are kind of random and don't map cleanly to the model.


Indeed, the worst offenders are in my opinion checkout, reset and pull.

They mix multiple only slightly related commands in one.


There are couple projects that try to tackle this problem by providing an alternative CLI (on top of git's own plumbing), like gitless and g2. Haven't used any of them myself, but would be interested in experience of others.


Any interface means you'll build an mental model of the system you're manipulating. How else could you possibly know what you want to do and what commands to issue?

So given a mental model is inevitable, seems reasonable that that model should be the actual model.


You don't need to understand how media is encoded to watch a movie or listen to a song. You don't need to understand the on disk format of a Word document to write a letter. When writing a row to an SQL database I don't always understand how that software is going to record that data, but I do know I can use that SQL abstraction to get it back out.


> You don't need to understand how media is encoded to watch a movie or listen to a song.

I recall the time when mp3 was to demanding for many CPUs, so you had to convert to non-compressed formats. Today you do need to know that downloading non-compressed audio will cost you a lot of network traffic. Once performance is a concern, all abstractions have to be discarded.


Exactly, if you stick to the very basics with git, you can live a happy life never caring about the internals. If you however want to dig into the depths of Git and use all its power, I don’t get why people don’t think there would be an obvious learning curve.

Same exact thing above applies to so many things in software development, from IDEs, to code editors (Vim/Emacs/Sublime/etc), to programming languages, to deploy tools, the list goes on. There’s a reason software development is classified as skilled labor and not a low end job generally. You’re expected to have knowledge of, or be willing to learn a lot, to do your job.


The difference is that the video model abstracts over the encoding, the git model does not abstract over the storage model, it exposes it. git commands are operations on a versioned blob store.


It's not versioned.


> So given a mental model is inevitable, seems reasonable that that model should be the actual model.

I think the longevity of SQL has proved there's value is non-leaky abstracted interfaces.


> I think the longevity of SQL has proved there's value is non-leaky abstracted interfaces.

How is sql non-leaky? To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc. To debug and improve them you need to look at the query plan which is the database exposing it's inner workings to you.

You have to know about the abstractions an sql server sits on as well. Why is it faster if it's on an SSD instead of an HDD? Why does the data dissapear if it's an in memory DB?


> To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc

No, you don’t. As far as I know, the data is stored in discrete little boxes and indexes are a separate stack of sorted little boxes connected to the main boxes by spaghetti. This is the abstraction, it works, and I don’t need to know about btrees, blocksizes, how locks are implemented, or anything else to grok a database.


You've never had to look at a query plan that explains what the database is doing internally? If not then I wouldn't consider you proficient, or you've only ever worked with tiny data sets.

Have you created an index? Was it clustered or non-clustered? That's not a black box, that's you giving implementation details to the database.


I don’t think being a professional DBA managing an enterprise Oracle installation is isomorphic to the general populace that might use git.

There’s no question that knowing more will get you more, but I think for the question of “when will things go sideways and I need to understand internals to save myself”, one would be able to use a relational database with success longer than git, getting by on abstractions alone. Running a high-performance installation of either is really outside the scope of the original point.


Those things don't generally influence how you structure the query, though - you can choose to structure your query to fit the underlying structure better, or you can modify the underlying structure to better fit your data and the manipulations you are trying to preform.

Yes, most of us will have to do both at some point, but they can be thought of as discrete skills.


This isn't a bad analogy though. Git itself is similar - once you understood the graph-like nature of commits (which isn't all that complicated to begin with), it's generally not hard to skim through a repository and understand its history. Diffing etc. is also simple enough this way.

If, on the other hand, you are working to create said history (and devise/use an advanced workflow for that), it's very helpful if you understand the underlying concepts. Which also goes for designing database layouts - someone who doesn't understand the basics of the optimizer will inevitably run into performance problems, just as someone who doesn't understand Git's inner workings will inevitably bork the repository.


You don't need to know more than sql to manipulate the data. The semantic of your query is fully contained in sql.

You may need to go deeper and understand the underline model if you want performance but sticking to normal form can make unnecessary for a lot of people a lot of the time.

You can have a useful separation of work between a developer understanding/using sql and a DBA doing the DDL part and the optimization when needed.


You have a very high standard for what 'proficient' means, and yet a very low one.

That is, I am not proficient with relational databases, and I can handwave why an SDD is faster, and why data may disappear from an in-memory DB.

But I couldn't do an outer join without help. Nor do I know when I would want to do one.

Bob Martin wrote the essay at http://blog.cleancoder.com/uncle-bob/2017/12/09/Dbtails.html , in which he writes:

> Relational databases abstract away the physical nature of the disk, just as file systems do; but instead of storing informal arrays of bytes, relational databases provide access to sets of fixed sized records.

This isn't true. SQLite does not use fixed size records.

This suggests to me that a lot of people who consider themselves proficient with SQL don't know how the results are stored on disk, nor the difference between the SQL model and the actual implementation details, making them not proficient under your definition.


> That is, I am not proficient with relational databases, and I can handwave why an SDD is faster, and why data may disappear from an in-memory DB.

Because you know that information for other reasons as most people would. Just because the information is gained for other reasons does not make it irrelevant when using a database though.

> This isn't true. SQLite does not use fixed size records.

It's actually true of most/all modern databases these days. The point isn't knowing the exact structure the database uses to store it's information (even though it can be useful) but knowing how efficiently it can find the information for any given request. Knowing when a database is doing an index lookup or a full table scan is very important and I wouldn't consider someone that can't make a reasonable guess to be proficient in sql. Many of these details are even exposed in the sql, when you create an index and decide if it's clustered or non-clustered your giving the database specific directions about how the data will be physically stored.

The fact that you need to know anything about how they do their work internally to be reasonably competent at using them makes them a leaky abstraction.


SQL leaks for complex queries and schemas if performance needs to be optimized. I argue virtually all abstractions leak heavily when performance is considered, some more than others. SQL leaks relatively little in comparison to some other technologies IME.

Also, SQL has well-established processes and formalisms to design schemas which generally result in solid performance by themselves. That's what RDBMS are around for, after all: enabling efficient and consistent record-oriented data manipulation. This is quite difficult to do correctly in reality; for example, if you write your own transaction mechanism for disk/solid-state storage, you are going to do it wrong. This is genuinely difficult stuff.

There is a ton of internals that SQL abstracts so well that very few DB programmers know or (have to) care about them. Things like commit and rollback protocols, checkpointing, on-disk layouts, I/O scheduling, page allocation strategies, caching etc.


You wrote "Just because the information is gained for other reasons does not make it irrelevant when using a database though."

Certainly. My comment, however, concerned what you meant by 'proficient', and not simple use.

You used the qualifier "all modern databases". Was that meant to imply that SQLite is not a modern database?

My point remains that there are many people who are proficient in SQL, and would do very well with SQLite, even without knowing the on-disk format.

That is why I disagree with your use of the term "proficient".


You seem to be talking about a different kind of leakiness. In my mind, there are two kinds: conceptual and performance leakiness. You are talking about the latter. Pretty much any non-trivial system on modern hardware leaks performance details. From what I understand, git's UI tries to provide a different model that the actual implementation but still leaks a lot of details of the implementation model.


It probably should be homomorphic to the actual model, but not the actual model. The map cannot be the terrain.


I disagree with that. The point of an abstraction is to not having to know the implementation. Understanding the principles used behind will always lead to a much better use of your abstraction


I'd also say an abstraction could be carrying its weight even if it only reduces the amount you have to think about the implementation details when using it.


Leaky abstractions is how we get stuff like ORMs.


To be fair, most ORMs poorly implement the "leaky" principle. When implemented well, like with SQLAlchemy, the end result is a much nicer ORM.

In fact, one of the things in common among the ORMs that have left a bad taste in my mouth is that they all tried to abstract away SQL without leaking enough of it.


I have a different thesis:

Picking the ideal interface to abstract is critically important (and very hard).

In the case of ORMs, available solutions abstract the schema (tables, rows, fields), the objects, or use templates. My solution abstracted JDBC/ODBC. The only leak in my abstraction was missing metadata, which I was able to plug (with much effort!).

My notions for interfaces, modularity, abstractions are mostly informed by the book "Design Rules: The Power of Modularity". http://a.co/hXOGJq1


This might sound a little out of touch, but am I the only one who doesn't think git is that hard? It is a collection of named pointers and a directed acyclic graph. The internals aren't really important once you have that concept down.


But what about the deal breaker in the article: a way to follow the decendants of a commit.


Took a few seconds with a search engine on "git descendants of a commit": https://stackoverflow.com/questions/27960605/find-all-the-di...

That said, I do feel some "porcelain" git commands are poorly named and operate inconsistently -- compared to the plumbing of the acyclic graph concepts which is good but limited.


So, in git, to show descendants of a commit, you use

    git rev-list --all --parents | grep "^.\{40\}.*<PARENT_SHA1>.*" | awk '{print $1}'
whereas in fossil, you use

    fossil timeline after <COMMIT>
I mean, one of these looks just a little more straightforward than the other, doesn't it?

Also, a cursory test in a local git repo just now showed that command seems to print out only immediate descendants--i.e., unless that commit is the start of a branch, it's only going to tell you the single commit that comes immediately after it, not the timeline of activity that fossil will--and all it gives you is the hash of those commit(s), with no other information.

I use git myself, not fossil, but if this is something you really want in your workflow, fossil is a pretty clear win.


I mean, sure. He really wanted this feature in fossil, gave it a first class command line ui, and its super easy.

How many other ways of looking at commits or trees are there, that are hard in git but impossible in fossil because the author didn’t feel like it?


I don't know why they have the need to retrieve the hash of the descendant commit, but usually what I'm doing is: I use a decent visual tool and just follow the branch (sourcetree).

You could alternatively use:

    git log --graph


  git log <COMMIT>..


`git log` stays in the current branch unless you give it the `--all` option. But when you give it the `--all` option the limitation by `<COMMIT>..` does no longer work. So not a solution.


  git log --all --ancestry-path ^<COMMIT>


You mean you didn't just read git's easy-to-follow, well-structured man pages or built-in help? /s

Half the time, when I know what I want, I both keep forgetting git flags and sub-commands - and struggle to find them in the man pages.

Like the fine:

git diff --name-only # i list only the files that are changed. But I'm not --list-files.


Diff lists files without --name-only though, so the flag specifies you want _only_ the filenames of those with a diff.


True, but it's inconsistent with eg grep.


Getting only the changed filenames is a fairly specialised operation. Often in normal use you can get away with a more generic operation that comes close to what you need, but is way more common, e.g.

    git diff --stat


If you're new to tech or you've got a different mental model of how version control works, getting across the gap to git is a challenge.

My current team are mostly controls engineers, working on PLCs. But the software we're now working with has its configurations tracked in git. These aren't dumb people, they're quite talented, but their education wasn't in CS, and "directed acyclic graph" is not a thing they have a mental model for.


No you're definitely not the only one. Git is one of the simplest and dumbest tools developers have at our disposal. People's inability to conceptualize a pretty straight forward graph is something no amount of shiny UI can ever fix.

I don't understand HN's hardon for hating Git.


Sure, and a piece table is a simple way to represent a file's contents. But if anyone wrote a shell or a text editor that required you to directly interact with the piece table to edit a file—instead of something sane—then they'd rightfully be called out on it. It wouldn't matter how much you argued about how simple the piece table is to understand, and it wouldn't matter how right you were about how simple the piece table is to understand. It's the wrong level of abstraction to expose in the UI.


The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands. That's about it. The whole point of an SCM is that graph that you want to move away from. People have asserted your claim many times but can't ever give specific things to fix about the "abstraction."

There are about 5/6 fundamental operations you do in git/hg. If that's too much then again, there's not an abstraction that is going to help you out.


See, you're trying to foist a position on me that isn't mine—that I'm scared of the essential necessities of source control. And you act as if source control were invented with Git. Neither of these are true.

> git/hg

Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles. The tradeoff was a few minor foibles of its own, but a much better tool. It's a fucking shame that Git managed to suck all the air out of the room, and we're left with a far, far worse industry standard.


>Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles.

No, Mercurial's design is fundamentally inferior to Git, and practically the entire history of Mercurial development is trying to catch up to what Git did right from the start. For example having ridiculous "permanent" branches -> somebody makes "bookmarks" plugin to imitate Git's lightweight branches -> now there are two ways to branch, which is confusing. No way to stash -> somebody writes a shelve plugin -> need to enable plugin for this basic functionality instead of being proper part of VCS. Editing local history is hard -> Mercurial Queues plugin -> it's still hard -> now I think they have something like "phases". In Git all of this was easy from the start.

Another simple thing. How to get the commit id of the current revision. Let's search stack overflow:

https://stackoverflow.com/questions/2485651/print-current-me...

The top answer is `hg id -i`.

    $ hg id -i
    adc56745e928
The problem is, this answer is wrong! This simple command can execute for hours on a large enough repository, and requires write privileges to the repository! Moreover, it returns only a part of the hash. There's literally no option to display the full hash.

The "correct" answer is `hg parent --template '{node}'`. Except `hg parent` is apparently deprecated, so the actual correct way is some `hg log` invocation with a lot of arguments.


I would not call "hg log -r tip" a lot of arguments.

Also, on the git/hg debate, I feel I've had problems (like the stash your modification and redownload everything) more often with git that hg. I mean perhaps it tells something about my capability to understand a directed acyclic graph, but hg seems less brittle when I'm using it.


I disagree with some of your comments, is git stash really essential or unneeded complexity? That's debatable, I never use it personally.

What I don't like in git is the loss of history associated with squashing commits, I would prefer having a 'summary' that would keep the full history but by default would ne used like a single commit.


In git you can use merge commits as your "summary" and `--first-parent` or other DAG depth flags to `git log` (et al) to see only summaries first. From the command line you can easily add that to key aliases and not worry about. I think that if GitHub had a better way to surface that in their UI (ie, default to `--first-parent` and have accordions or something to dive deeper), there would be a lot less squashing in git life. (Certainly, I don't believe in branch squashing.)

The DAG is already powerful enough to handle both the complicated details and the top-level summaries, it's just dumb that the UIs don't default to smarter displays.

(I find git stash essential given that `git add --interactive` is a painful UX compared to darcs and git doesn't have anything near darcs' smarts for merges when pulling/merging branches. Obviously, your mileage will vary.)


>you're trying to foist a position on me that isn't mine

I just said you can't give specifics on what to change, because there isn't much too change.

>And you act as if source control were invented with Git

No I'm not?

>and we're left with a far, far worse industry standard.

Yeah, we definitely should have gone with the system that can't do partial checkouts correctly or even roll things back. Branching name conflicts across remote repositories and bookmark fun! Git won for a reason, because it's good and sane at what it does.


That reason was called Linus and Linux kernel development.

The master can do no wrong.


No, the reason is mercurial sucked at performance with many commits at the time, and was extra slow when merging.

Lacked a few dubious features such as merging multiple branches at the same time too.

It has improved but git is still noticeably more efficient with large repositories. (Almost straight comparison is any operation on Firefox repository vs its git port.)


Mercurial has always been better than Git on Windows.

Those dubious features are so relevant to daily work that I didn't even knew they existed.


Git main target is Linux. Obviously. Performance on the truly secondary platform was not relevant and it is mostly caused by slow lstat call.

Instead Mercurial uses additional cache file which instead is slower on Linux with big repos. But happens to be faster in Windows.

And the octopus merge is used by kernel maintainers sometimes if not quite a lot. That feature is impossible to add in Mercurial as it does not allow more than two commit parents.


Which reinforces the position that git should have stayed a Linux kernel specific DVCS, as the Bitkeeper replacement it is, instead of forcing its use cases on the rest of us.


>Which reinforces the position that git should have stayed a Linux kernel specific DVCS

No it doesn't? People use octopus merges all the time, every single day.


Well, I only get blank stares when I mention octopus merges around here.


...as I get stares (okay, mostly of fear) if I point out that we need a branch in my workplace. What you can/can't do (sanely) with your tool shapes how you think about its problem space.

To emphasize that even more: Try to explain the concept of an ML-style sum type (i.e. a discriminated union in F#) to someone who only knows languages with C++-based type systems. You'll have a hard time to even explain why this is a good idea, because they will try to map it to the features they know (i.e. enums and/or inheritance hierarchies), and fail to get the upsides.


Easy, is is called std::variant, available since C++17.


Yeah, I guess. Except that std::variant is basically a glorified C union with all the drawbacks that entails.


But git didn't force its use on anybody, lol. If you need a scapegoat, try GitHub!


You wrote: People have asserted your claim many times but can't ever give specific things to fix about the "abstraction."

That seems like you made an assertion as well. I think there are counter-examples.

For example, the point of gitless is (quoting http://gitless.com/ ):

> Many people complain that Git is hard to use. We think the problem lies deeper than the user interface, in the concepts underlying Git. Gitless is an experiment to see what happens if you put a simple veneer on an app that changes the underlying concepts

Some commentary is at https://blog.acolyer.org/2016/10/24/whats-wrong-with-git-a-c... .

Many HN discussions as well, including https://news.ycombinator.com/item?id=6927485 .


> The whole point of an SCM is that graph that you want to move away from.

I think that's an exaggeration. For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead. I'm sure there are other useful ways to model DVCS too.

Whilst this is mostly irrelevant for Git users, you mentioned Mercurial so I thought I'd chime in :)

> The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands.

I mostly agree with this: Git is widespread enough that it should mostly be kept stable; anything too drastic should be done in a separate project, either an "overlay", or a separate (possibly Git-compatible) DVCS.


>For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead.

I said graph, I didn't say which graph. Both systems still use graphs. And still a graph you have to understand how to edit with each tool. The abstraction is still the same, and if you have problems with Git, you're going to have problems with either of those tools as well. The abstraction is not the problem, it's the developers inability to conceptualize the model in their head.

Where is the exaggeration?


> I said graph, I didn't say which graph

You said "that graph" which, in context, I took to mean the git graph.

> Both systems still use graphs

True

> The abstraction is still the same

Not at all, since those graphs mean different things. Each makes some things easier and some things harder. For example, time is easy in git ("what did this look like last week?"). Changes are easy in Darcs ("does this conflict with that?"). Both tools allow the same sorts of things, but some are more natural than others. I think it's easy enough to use either as long as we think in its terms; learning to think in those terms may be hard. For git in particular, I think the CLI terminology doesn't help with that (e.g. "checkout").

> if you have problems with Git, you're going to have problems with either of those tools as well

Not necessarily. As a simple example, some git operations "replay" a sequence of commits (e.g. cherrypicking). I've often had sequences which introduce something then later remove it (bugs, workarounds, stubs, etc.). If there's a merge conflict during the "replay", I'll have to spend time manually reintroducing those useless changes, just so i can resume the "replay" which will remove them again.

From what I understand, in Darcs such changes would "cancel out" and not appear in the diff that we end up applying.

> Where is the exaggeration?

The idea that "uses a graph" implies "equally hard to use". The underlying datastructure != the abstraction; the semantics is much more important.

For example, the forward/back buttons of a browser can be implemented as a linked list; blockchains are also linked lists, but that doesn't mean that they're both the same abstraction, or that understanding each takes the same level of knowledge/experience/etc.


>The idea that "uses a graph" implies "equally hard to use".

What I'm getting at is that if you don't understand what the graph entails, and what you need to do the graph, any system is going to be "hard to use." This idea that things should immediately make sense without understanding what you need to do or even what you're asking the system to do, is just silly.

I've never seen someone who understands git, darcs, mercurial, pijul, etc go "I totally understand how this data is being stored but it's just so hard to use!" I don't think that can be the case, because any of the graphs those applications choose to use have some shared cross section of operations:

* add

* remove

* merge

* reorder

* push

* pull

I see people confused about the above, because they don't understand what they're really asking the system to do. I don't think any abstraction is ever going to solve that.

Git does have a problem with its command line (or at least how consistent and ambiguous it can sometimes be), but you really should get past it after a week or two of using it. The rest is on you. If you know what you want/need to do getting past the CLI isn't hard. People struggle with the former and so they think the latter is what's stopping them.


Could you please remove the thorniness and condescension for your posts? It breaks the guidelines and makes discussions worse.

https://news.ycombinator.com/newsguidelines.html


Can you tell the other guy to not post false and disingenuous statements? Because I'm pretty sure that is what degrades discussions, not any tone I choose to exhibit. I highly encourage you to read the thread thoroughly. If I switched my position on git we wouldn't be having this discussion, as evidenced elsewhere in the thread where people are taking a notably blunter tone than I am just with the side with popular support on this forum.

I posted a bald statement. He replied directly with snide remarks and fallacies. Look at the timestamps and edits. I have every right to be annoyed and make it known that I am annoyed in my posts when the community refused to consistently adhere to guidelines.

Enforce guidelines that keep discussions rational, not because people don't want to be accosted in public for their misleading, emotionally bloated statements.


It doesn't matter what you're replying to. The guidelines always apply, so please follow them.


>The guidelines always apply

They are currently not being applied. Is it fair for me to point out how inconsistently the posts are being treated?

https://news.ycombinator.com/newsguidelines.html:

>Don't say things you wouldn't say face-to-face. Don't be snarky.

"Every day humans make me again realize that I love my dogs, and respect my dogs, more than humans. There are exceptions but they are few and far between." [2]

>Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize.

"Which sort of doesn't matter since everyone thinks GitHub is source management." [1]

>Please don't post shallow dismissals

"You all lost out on "the most sane and powerful" as a result." [1]

"Calling it a sane and powerful source control tool is just not supported by the facts, calling "the most ..." is laughable." [1]

"Calling Git sane just makes it clear that you haven't used a sane source management system." [1]

"Lots of people are too busy/whatever to know what they are missing, maybe that's you. It's not me" [3]

>When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

"Arguing with some random dude who thinks he knows more than me is not really fun." [3]

"Dude, troll much?" [4]

[1] https://news.ycombinator.com/item?id=16806588

[2] https://news.ycombinator.com/item?id=16807652

[3] https://news.ycombinator.com/item?id=16806877

[4] https://news.ycombinator.com/item?id=16807763

At least I had the decency to correct dishonest statements with vanilla citations in my posts, and they still got ignored.


Some of the things you quoted there are admittedly borderline, but you went much further across the nastiness line. Could you please just not do that? It isn't necessary, and it weakens whatever substantive points you have.


>borderline, but you went much further across the nastiness line.

I didn't insinuate that people are worth less than pets that I bought and own (who can't even choose who to be dependent on) because they don't agree with my perspectives over a piece of software. In what context would this be an acceptable statement to make face to face or in a public setting and you go "well, you know, it's kind of okay to say!"

I'm exceedingly interested in where I crossed that line in a considerable manner because that's one distant line to cross. Next time someone says something I perceive to be incorrect, or they get on my nerves for continually disagreeing with me, I'll be sure to tell them my dog is worth more than them since that's actively being allowed and has a precedent of moderator support.

And for the record, my tone is probably "abrasive" in this post because the above actions and outright blind eye towards outright lies and uncalled for statements is aggravating. I have a feeling you're not doing anything just because of who he is, and not because what he is saying is warranted or even accurate (it's definitely not, as I demonstrated across several different posts).

I've archived this thread so people are free to review my actions and moderator actions at a later date: https://web.archive.org/web/20180411062201/https://news.ycom...

I've said my piece.


Exactly, it was no longer mysterious for me after I had to prepare a written branching procedure for our team starting from how to branch off, commit, rebase to doing resets and working with reflog. While doing that I've thoroughly read the official docs, examined lots of examples, created a local repo with a couple of text files to test various commands. An then it became so clear and simple! Especially the reflog – so powerful!

So, my advice is to try to write some instructions for yourself for all the common cases you might run into during your work. It will not only help you realise what you actually need from git, but also will serve as a good cheat-sheet.


This looks like a good chair lift: http://gitless.com/


Git Pro's chapter on git internals does a good job of explaining some of the things going on under the hood.

https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...


I wrote a book that tries to :)

https://leanpub.com/learngitthehardway

I start with simple examples and work up from there. It's based on training I've conducted at various companies, and avoids talk of Merkle trees or DAG.


Like `git help`? I has everything important grouped nicely and hints you to even more subcommands.


I am not a git expert or anything, but I have helped resolve weird git issues for my teammates usually using a lot of Google and StackOverflow.

I just know 5 basic commands; pull, push, commit, branch, and merge. Never ran into any issues. People who run into issues are usually editing git log or doing something fancy with “advanced” commands. I have a feeling that these people get into trouble with git cause they issue commands without really knowing what those commands do or even what they want to achieve.


Start working in a repo with submodules and you suddenly have to understand a lot more and can get into trouble with no idea how you did it.


I use submodules every day, never had a problem with them. What do people complain about when it comes to them?

My mental model is basically that they're separate repos, and the main repo has a pointer to a commit in the submodule. Do your work that needs to be done for the submodule, push your changes, and then check out that new commit. Make a commit in the main repo to officially bump the submodule to that new commit. Done.

The annoying part is when you do a pull on the main repo, you have to remember to run git submodule update --recursive.


Because you have the .gitmodules file, the .git/config file, the index, and .git/modules directory, each of which can get out of sync with the others.

If, for example, you add a submodule with the wrong url, then want to change the url, then you instinctively change .gitmodules. But that won't work, and it won't even nearly work.

If you add a submodule, then remove it, but not from all of those places, and try to add the submodule again (say, to a different path), then you also get wierd errors.

If you add a submodule and want to move it to another directory then just no.

Oh and also one time a colleague ran into problems because he had added the repo to the index directly - with git add ..

Oh and let's talk about tracking submodule branches and how you can mess that up by entering the submodule directories and running commands...


Why do you want to bypass the tool at the first glance? Git submodule command has a way to update these urls...


Heh, good question.

But seriously, the fact that there is a .gitmodules file lulls you into a sense that that file is "the configuration file". If you don't know about these other files, then it's natural to edit .gitmodules. When you make errors, the fixing those errors are pretty hard. There is no "git submodule remove x" or "git submodule set-url" or "git submodule mv".

For example, do you know how, on the top of your head, to get an existing submodule to track a branch?

How do you think someone who does not quite understand git would do it? Even with a pretty ok understanding of git infernal, you can put yourself deep in the gutter. (case in point, if you enter the submodule directory and push head to a new commit, you can just "git add submodule-directory" to get point the submodule to the new commit. But if you were to change upstream url or branch or something else in the submodule, you're screwed. That's not intuitive by a long shot)

Edit: git submodule sync is not enough by the way... You can fuck up your repo like crazy even if you sync the two configuration files.


Right, it’s not that hard, but there are some gotchas. The most common problem I see is the local submodule being out of sync with the remote superproject. Pushes across submodules are not atomic. Accidentally working from a detached head then trying to switch to a long out of date branch can be an issue, as can keeping multiple submodules synced to the head on the same branch. Recursive submodules are, as you mentioned, even more fun.


The same problem appears in any non monolithic project. In any SCM I know of.

Git subrepo or subtree are some of a solution but not quite complete and easy to use.

In some other scms (P4 and SVN, partly hg) the answer is don't do that, which had a whole lot of its own problems.


Oh, so that's what you do!


Heh, I probably made it sound more complicated than it really is. Just think of it as a pointer that needs to be manually updated.


I'm comfortable with most advanced git stuff. I don't touch submodules.


> I don't touch submodules.

What's the alternative? Managing all dependencies by an external dependency manager does not exactly reduce complexity (if you're not within a closed ecosystem like Java + Maven that has a mature, de-facto standard dependency manager; npm might count, too).

It's absolutely not feasible for C++ projects; all projects that do this have horrible hacks upon hacks to fetch and mangle data and usually require gratuitous "make clean"s to untangle.


I use git sub-trees. Actually I love the thing. They give you a 'linear' history, and allow you to merge/pull/push into their original tree, keeping the history (if you require it).


Never heard of them (well, probably in passing); will look into is. Thanks!


Why isn't it feasible for C++ projects?


Oh you can fuck right off with submodules!


^^^ this comment is supposed to be humor, not douchebaggery, by the way. Easy on the downvotes.


I never had any problems the past 6 years I've been using Git professionally. But then someone asked me what to do when Git prevents you from changing branches and not knowing they did not stage, I told them to stash or commit. They stashed and the changes were gone.

My point is, while your basic commands do the work, your habits and knowledge keep you from losing code like this without you knowing.


Why were the changes gone? Why couldn't they "git stash pop"?


Unstaged or untracked changes were gone. They couldn't get those back after pop. I can't remember which.


Untracked files are not stashed, that is true.


They're also not deleted by "git stash" though.


Is no one reading git's help pages before running a command the first time?

Not even once I lost I code worked on with git. stash is a reliable companion across branches and large timespans.


I do like Git, most of the time, but really, not a single problem, in six years?

When using Git daily we never really did anything complicated, just a few feature branches per developer, commit, push, pull-request, merge. Basic stuff. We had Git crap out all the time. Never something that couldn't be fixed, but sometimes the fix was: copy your changes somewhere else, nuke your local repo, clone, copy changes in and then commit an continue as normal.


I’ve been using git since 2007 and never ever even wanted to try nuking a checkout and starting over to recover from anything, much less did so. (Did have a nameless terrible Java ide plug-in do it for me once.)


So you're not using checkout, reset, and diff?


Good point, forgot about checkout, diff, clone, blame, add, rm, rebase, init, and probably a few more.

Haven't used reset personally though but only when trying to fix someone's repo.


Or fetch?


I think a lot of people ignore fetch and only ever pull.


I think it's the most important sources of my cognitive dissonance around git. It strengthens the illusion that a working directly is somehow related to a git store, which it really isn't.

You have a working directly/checkout - that can be: identical (apart from ignored files) to some version in git; or different.

If it's different ; some or all changes can be marked for storing in the git repo - most commonly as a new commit.

It's a bit unfortunate that the repo typically is inside your work directory/checkout - under '.git' along with some files like hooks, that are not in the repo at all...


But you'd have to stash before pull. At least with my config where a pull will rebase automatically.


I use `git config pull.rebase true` too, but that doesn't mean you _have_ to stash first, just as rebase manually wouldn't - depends if there's a conflict.

Same is true of merge-based pull.


Except for saving some typing, is there any benefit to stash over local branches?

In other words, shouldn't git just fix ux for branches and rip out stash?


So "some typing" would be:

    # git stash:
    prev_ref="$(git rev-parse --abbrev-ref HEAD)"
    git checkout -b wip-stash
    git add .
    git commit -m 'wip stuff'
    git checkout "$prev_ref"
    
    # git stash pop:
    git checkout wip-stash -- .
    git checkout -D wip-stash
It's quite a considerable saving. I suppose by "fix UX" you mean make it so the saving would be less anyway, but I think really they're just conceptually different:

    - branch: pointer to a line of history, i.e. a commit and inherently its ancestors
    - stash: a single commit-like dump of patches
If stashing disappeared from git tomorrow, I think I'd use orphan commits rather than branches to replace it.


pull == "just fuck my shit up"


EDIT: I guess I misread it. On reflection what I wrote really doesn't make sense so let me retract.


`cherry-pick` is just plucking a single commit and adding it the commit history of the current branch, and `rebase` is what civilized people use when they don't want merge commits plaguing their entire code base.


merge is what civilized people who care about getting history and context in their repository use ;) ... I worked a lot in git using both rebase and merge workflows and I'll be darned if I understand the fear of the merge commit ... If work happened in parallel, which it often does, we have a way of capturing that so we can see things in a logical order ...


Polluting the master repo with a bunch of irrelevant commits isn't giving you context, it's giving you pollution. There's nothing to fear about merge commits. It's about wasting everyone's time by adding your 9 commits to fix a single bug to the history. I work on teams, and we care about tasks. The fact that your task took you 9 commits is irrelevant to me. What is relevant is the commit that shows you completed the task.


It's not really a fear of the merge commit. In a massively collaborative project, almost everything is happening in parallel, and most of that history is not important. The merge makes sense when there is an "official" branch in the project, with a separate effort spent on it. It's likely that people working on that branch rebase within the branch when collaborating, and then merge the branch as a whole when it is ready to join the mainstream.


Ah, you can learn the beauty of merge AND rebase at the same time then...

Here to 'present' feature branches, we take a feature development branch will all the associated crud... Once it's ready to merge, the dev checkouts a new 'please merge me' branch, resets (or rebase -i --autosquash) to the original head, and re-lay all the changes as a set of 'public' commits to the subsystems, with proper headings, documentation etc.

At the end, he has the exact same code as the dirty branch, but clean... So he merges --no-ff the dirty branch in (no conflicts, same code!) and then the maintainer can merge --no-ff that nice, clean branch in the trunk/master.

What it gives us is a real, true history of the development (the dirty branch is kept) -- and a nice clean set of commits that is easy to review/push (the clean branch).


Sometimes I want to take a subset of the commits out of a coworker's merge on staging to push to production, and then put all non-pushed commits on top of the production branch to form a new staging branch. I find having a linear history with no merges helpful for reasoning about conflict resolution during this process. What advantages do merged timelines give in this context?


What I like about merges it that it shows you how the conflicts were resolved. You can see the two versions and the resolved and you can validate it was resolved properly. With a rebase workflow you see the resolutions as if nothing else existed, you can't tell the difference between an intentional change and a bad resolution...


> merge is a what civilized people who care about getting history and context in their repository use

> I'll be darned if I understand the fear of the merge commit

I apologize in advance for not adding much substance in this reply, but I agree too much to just upvote alone.


Just curious... are you working in a team using git workflow?


Yes, my direct team is small of 4 devs but the main repo we work on is used by 100+ devs. We use git workflow (new branch for each feature) for the main repo and github style workflow (clone and then submit PR) for some other repos.


The number 1 reason my team has not moved from Subversion to Git is we can't decide what branching model to use. Use flow, don't use flow, use this model, use that model, no, only a moron would use that model, use this one instead. Rebase, don't rebase, etc. No doubt people will say that it all depends on the project/team/environment/etc., but nobody ever says "If your project/team/environment/etc. look like this, then use this model." So we keep on using Subversion and figure that someday we will run across information that convinces us that it is the one true branching model.


I have another solution: just switch to mercurial. I switched some big projects to mercurial from svn many years ago. Migration was painless, tooling was similar but better, the interface is simpler than git, and haven't regretted it once.


This is the path I took for a few projects years ago when Google Code didn’t support git.

Switched to mercurial from svn and workflow was painless for the team. Interestingly, we slowly started adopting more distributed techniques like developer merges being common. With svn, I think I was the only one who could merge and it would be rare and added product risk.

Then after about a year of mercurial we switched to git and our brains had adapted. Our team was small, 5-10 people.

Somewhat relatedly, in 2002, I worked in a large team of 75 people or so with a large codebase of a few hundred thousand lines of active dev. It used Rational ClearCase had “big merges” that happened once or twice a release with thousands of files requiring reconciliation. There was a team who did this so it was annoying to dev in, but largely I didn’t care.

Company went through layoffs and the team was down to one. He quit, the company couldn’t merge, so couldn’t release new software versions.

There was a big crisis so they went to the architects and pulled a few out of dev work. It turns out I was the one who could figure it out and dumb enough to admit it.

That sucked. It took a few weeks to sort out and modify our dev process to make merges easy and common. But it was not fun. Upside is we ended up not having any “non-programmer” op/configuration management people since the layed off/quit team were ClearCase users, who didn’t code.

Moral- don’t let people know you can do hard, mundane tasks.


I have converted all my mercurial repos to git and I have forgotten all mercurial now. It helps me feel less pain when I am forced to work in Git....


> but nobody ever says "If your project/team/environment/etc. look like this, then use this model."

Honestly, its because a lot of it comes down to preference and what value you gain from using version control. It is very much like code style standards -- it doesn't matter what is in the standard so much as your teammates all using the same one.

If part of the blocker for your team is that no one is experienced enough with git to have a strong opinion, I'd be happy to brainstorm with you for an hour to learn about your current process and offer a tailored opinion.


Why not replicate whatever you are doing in Subversion in Git? You'll still be able to take advantage of the better merging algorithms, while maintaining whatever political momentum seems to be driving the team's decisions.


It really, really doesn't matter. That's one great thing about a distributed SCM.


We moved from SVN to Fossil and it has worked out great for us. The other option was Mercurial but it required Python.


If it is import to switch to Git, I suggest a technical leader, imbued with authority from management, make those decisions and just do it. However, I don't necessarily think a team should switch away from Subversion if it's working for them.


> everyone understands a different part of git and has slightly different ideas of how things should be done

This was a big problem that bugged me too, so for every team I've worked with I've created a few scripts for the team's most common version control operations.

Most devs, including me, are pretty lazy so they'd all rather run this script than go to Stack Overflow to figure out git arcania.

This helps standardize conventions too: Feature branches/linear DAGs/topic branches/dev branches/prod branches/whatever weird thing a team does they all just do that using the script so it's standardized.


“rebase” is just “pull before push”, right?

While I have no opinion on git, I can’t abide by all the precious chaotic mutant misuse, like git-flow.

I’d happily accept a subset of primitives, if only to disallow bad ideas. Kinda like Git vs SVN, C/C++ vs Java, flamethrower vs peanut butter.


Rebase is "rewind local changes" "pull" "replay local chances"

Basically it makes it so that all of the local-only commits are sequenced after any remote changes that you have not seen yet.

[edit]

YZF is correct. In the context of pulling (i.e. "git pull --rebase") my description is correct. However in general rebasing branch X to Y that diverge from commit C is:

rewind branch Y to commit C; call the old tip of Y Y'

play all commits from C -> X on Y

play all commits from C -> Y' to branch Y.


You can rebase between two local branches. The rebase operation has nothing to do with pull or remote vs. local.


Yes. I thought we were in the context of git pull --rebase...


"pull" might be the first thing I'd throw out, if thought there was any hope of fixing git ux. Then add a working merge --dry-run #do i have conflicts?.


I think a default of --ff-only would be fine for pull. This is great for when I'm merely a consumer of a project, and will never silently perform a merge or rebase.


Thanks (all) for the clarifications.

When explaining to others, I should probably say 'pull, reapply, then push'.

Perhaps 'rebranch' is a better word choice than 'rebase', to conceptually more closely match what's actually happening under the hood.


"rebase" is not just "pull before push", though.

It's pull then rewrite all your personal commits to be based on the latest tip from that pull.


rebase is simply(tm) replaying a sequence of commits (or diffs or patches for that matter) over some arbitrary base, hence re-base ...


rebase can do a lot more. Try `git rebase -i` to squash smaller commits, edit the commit msg, or even drop a commit before you push it to your colleagues.

Last time our devop did 20 commits to get something on elasticbeanstalk right, I squashed it all into just one clean commit that got merged into master branch.

It will help you to commit more often without worry until the moment you have to hand in your work.


Rebase is a controversial history altering operation and makes it easy to paint yourself into a corner and get weird error messages or wrong results. Its very different from pull/merge.


History altering is only controversial on things that are published. There is nothing wrong with reordering, combining or splitting your local commits to give more clarity to what you are doing. Keeping this in mine will give you the freedom to commit frequently.

This confusion happens because many popular SCMs historically have the "commit" and "push" operation in a single step. Git keep them separate.


There is no tracking by git on what is published, so it's easy to make the mistake of rebasing things that are published and shared by others. Then you will have a bad time later when you try to sync with others, possibly days later.


Um... git kind of does with remote tracking branches. You can also make it very obvious by your workflow? If you use local feature branches (which you should for juggling between development tasks, etc.), what you are working on vs what's upstreamed should be pretty clear. Sounds like you are not using local branches.

Not using local branch is another confusion caused by the perspective of historical/traditional SCMs (people thinking branches are the domain of a centralized server and are outside of their control.)


Often you want to push changes to a remote, but not yet merge or PR them to upstream.

Keeping "local feature branches" just on your dev machine is bad for many many reasons:

- you want to encourage low barrier cooperation in your team -> sharing changes

- you want changes to the CI pipeline early so the potentially slow testing machinery works in parallel with the developer

- you want to keep the team up to date on what changes you make

- you don't want to lose work if the machine/OS dies, or the developer leaves/becomes sick/goes on a 4 week vacation during which they forget their disk crypto password

So, in practice you can try to use rebase opportunistically, when out of chance your WIP work is still unpushed because the change was only made very recently. This is error prone. Or you can rebase published branches explicitly, by destroying the original branches in the PR merge phase. But all this is big bother if the purpouse is to just beautify history and at the same time hide the real trial and error that went into making the changes.


Did you notice that y2kenny was talking about how, if you use local feature branches, then the remote tracking branches make it really clear what's been published vs not? The implicit meaning is that we should use local feature branches but also publish them to the repo while we're working on them.

But maybe to you, 'publish' means 'publish to master'? In that case I can assure you, they are not necessarily the same thing. I regularly work on a local feature branch, publish that branch to the shared repo, rebase it on top of master, then force-push to the shared tracking branch. When I'm done I merge it into master and don't rebase master on top of anything.


who said anything about not publishing?


I'm not sure if you are being serious? The answer is that published advice on rebase overwhelmingly warns against rebasing published code, and for good reason.


Who said anything about rebasing published code?


I LOVE rebase but when I run into merge conflicts I rather `rebase --abort` and leave that merge commit as it is. But those instances are rare and having a merged branch's commits nice and compact in the log makes me happy every time.


Nobody understands SVN or CVS either.

I discovered this supporting SVN servers for whole bunch of developers.


I always found the mercurial ui super easy.

The error messages are clearer, it is multiplatform, all the advanced functionalities are there, a nice graphic interface exists.

I really do not understand why git won, apart from github.


What I find ironic is that github is massively popular as a central way to use a distributed version control system. The distributed nature only adds to the complexity and I am sure it is only used by a fraction of git users.


Yes...? What's surprising about using a central repo to collaborate? There needs to be a single source of truth for a coherent project, otherwise you're just going to have chaos.

The distributed nature of git led to the simple and secure contribution model of everyone working on their own repos and not needing to give write access to anyone else. This pretty directly led to an explosion of open source software.


Is there any really good tutorial on git that teaches the internal model? Ideally, it would illustrate each command and show the before and after of the internal objects.


https://learngitbranching.js.org/ is the best guide I've seen. It shows you the complete commit graph and all refs on that graph, and updates the graph when you type in commands. It covers and displays workflows involving remotes as well.

If you don't want the tutorial, you can go straight to the sandbox here: https://learngitbranching.js.org/?NODEMO


Indeed. When the article said "younger developers only know git" I immediately thought, no, they don't know anything. These people don't even know what a DAG is. Git was made for people who know these concepts. I've tried explaining git to people and they just don't understand. They just don't.

What's annoying is that git is just expected knowledge these days and having a github account is enough to claim it. There's not a good way to sell the fact that you're a bit more into it than that.

I've even said to git "experts" that branches should really be called refs and their eyes glaze over. It's difficult for me to understand what git is in their heads.


Why would you call branches ref's? They don't point to specific files or commits.

I know you can target commits through them - which utilizes the ref syntax... But they're still not really referencing anything directly.

They're completely arbitrary and are just a feature to improve gits workflow.


I started naming branches 'post-its', as to me that's what they are, labels you place on the real 'branches' (the commit tree). You can take them of easily, move them, discard them, whatever you want. They are just volatile.


I should have said pointers. I didn't mean to overload existing git terminology. My point was just that they are pointers/references to some commit.


> They don't point to specific files or commits.

A branch points to the tip(last commit) of a particular timeline.


But they are also called symbolic refs in git terminology...


A symbolic ref is a ref that points to another ref instead of a ref that points to a commit. `HEAD` is a symbolic ref. (It should be your only symbolic ref.)


Unless it is detached. :)


That term makes sense.

But just as you wouldn't call a symlink to a zip archive a zip file itself, you also shouldn't call a branch a ref.


Hrm, but a ref is a file containing a hash, right? So if the hash is equivalent to the file, the surely a ref is equivalent to a symlink? A symbolic ref, in turn, should be a symlink to a symlink... Or something like that...


A ref points to an object. That object doesn't change unless the hashing algo was tricked.

A branch points to anything you want it to point to. It can be any ref you want and can be changed at will.


sha1 - object (e.g. 5a480efb...) file with sha1 - ref (e.g. master) file with ref - symbolic ref (e.g. HEAD)

right? Seeing as you can git update-ref branches, but you need to git symbolic-ref HEAD.


But it is a ref. It's an alias for the last commit of a particular timeline, as I said above.


So would you rather say a branch is a commit?


A branch is a pointer or symlink if you will.


> It's difficult for me to understand what git is in their heads.

In that case, they were thinking the git was you.


Git is the solution to the problem of doing distributed development on the Linux kernel. People who aren’t doing that, I wonder if they’re entirely clear in their own minds why they use it. I’m certainly not... other than that it’s just the default choice these days, the path of least resistance...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: