this post was submitted on 24 Jul 2024
16 points (100.0% liked)

Git

2868 readers
1 users here now

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

Resources

Rules

  1. Follow programming.dev rules
  2. Be excellent to each other, no hostility towards users for any reason
  3. No spam of tools/companies/advertisements. It’s OK to post your own stuff part of the time, but the primary use of the community should not be self-promotion.

Git Logo by Jason Long is licensed under the Creative Commons Attribution 3.0 Unported License.

founded 1 year ago
MODERATORS
 

I think it's generally agreed upon that large files that change often do not belong while small files that never change are fine. But there's still a lot of middle ground where the answer is not so clear to me.

So what's your stance on this? Where do you draw the line?

top 10 comments
sorted by: hot top controversial new old
[–] [email protected] 9 points 3 months ago

The main downside is Git downloads all history by default, and so any large files will bloat the download for people cloning your repo forever. It isn't about binary vs text. It's just the size that matters.

[–] [email protected] 8 points 3 months ago (1 children)

If it's a build artifact, put it in a registry. If it's resource type files, Git LFS can be used if it's not an absolute ton.

[–] [email protected] 2 points 3 months ago

This. If the file can be generated from the repository it should not be put inside it, but if you need it to build the project it should (unless it is an easy to install external dependency that should be declared in a Readme file).

[–] [email protected] 4 points 3 months ago

Fyi, there's a fun project designed for handling the syncing of large files that uses git under the hood called git-annex. Fun fact, it's written in Haskell as well.

[–] [email protected] 4 points 3 months ago

I don't like it, but if they're part of the project files, then they belong in version control. I do worry about the challenges of combining the difficult-to-merge nature of binaries with the distributed workflows that Git encourages. While data doesn't get lost, the inability to merge them may mean that someone needs to spend extra time re-performing their changes if they "lose" the push/merge race.

Game engines have been doing a better job of transitioning away from large monolithic binaries by either serializing them in somewhat mergeable text files or at least splitting them into large numbers of smaller binaries to reduce file contention.

Git LFS does offer the ability to off-load them from the repository, reduce download and checkout times as well as the ability to lock files (which does introduce centralization...), but it doesn't seem to be as ubiquitous and can be more expensive to use, depending on the team's options for Git repo providers.

Note: I assume you mean binaries as in "non-text files", not build artifacts, which definitely don't belong in version control at all.

[–] [email protected] 2 points 3 months ago

I think the only binaries I have are tiny samples used by a couple of tests in that repo. I generally try to avoid them altogether.

[–] [email protected] 2 points 3 months ago

I'll go to quite a bit of effort to avoid them. Arguably too much effort, but I often find that the path that avoids them is also useful in other ways.

For example, for a personal project, I automated rendering a PNG fallback icon from an SVG, so now I can have as many different resolutions as I want and don't need to manually update them, if I want to tweak the icon.

I'd also like to publish a screenshot of the project. The simple solution is to check a PNG into the repo and link it in the README.md. But what would be a lot nicer, is to set up a project webpage, which with Codeberg Pages isn't even that much effort, but I would have less motivation to do it otherwise.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

I think assets like app icons are ok. They rarely change, and are often quite small. It’s convenient to have those kinds of things bundled together with the code.

[–] [email protected] 0 points 3 months ago (1 children)

Never do this.

Git is all about tracking changes over time which is meaningless with binary files. They are bloat for your repo, slowing down operations. Depending on the repo, they are likely to change from CI with every commit. That last one means that every commit turns into 2 commits btw. They are can ruin diffs. I could go on for a long time here.

There are basically 0 upsides. Use an artifact repository instead!

[–] [email protected] 6 points 3 months ago

Git is all about tracking changes over time which is meaningless with binary files.

Utter codswallop. You can see the changes to a PNG over time. Lots of different UIs will even show you diffs for images.

Git can track changes to binary files perfectly well. It might not be great at dealing with conflicts in them but that's another matter.

The only issue is that binary files tend to be large, and often don't compress very well with Git's delta compression. It's large files that are the issue, not binary files. If you have a 20 kB binary file it's going to be absolutely fine in Git. Likewise a 10 GB CSV file is not going to be such a good idea.