Hacker News new | past | comments | ask | show | jobs | submit login
The sad state of sysadmin in the age of containers (2015) (vitavonni.de)
428 points by maple3142 on Dec 9, 2019 | hide | past | favorite | 311 comments



Ever tried to security update a container?

The whole point of using a container is that you can destroy it and build a new one easily. The new one should be built using up-to-date packages with security patches applied (and tested, obvs). Using the 'pets versus cattle'[1] analogy, patching a container feels like you're treating it like a pet. You should just kill it and get a new one instead.

[1] https://thenewstack.io/how-to-treat-your-kubernetes-clusters...


The whole point of having a distro with a good reputation is that you can incrementally outsource your trust to them, one package at a time. OpenSSL needs a fix? One package update per machine, and if you have lots of machines, you update that.

With VMs, you can use the same tools you use with any other machine.

With containers, you are responsible for figuring out which ones need which packages and how to re-build them. Your method is likely to be idiosyncratic, so you not only need to be your own security team, you need to be your own packaging maintainer and infrastructure maintainer.

The two extreme versions of this are the neglect model, in which you just don't care about anything, and the bleeding-edge model, where you automatically build with the latest versions of everything pulled from their origins. Both ways end up with unexpected security disasters.


I don't understand why you think it's hard to tell your CI/CD pipeline to run itself on a schedule. I find this totally consistent with my tooling (I have CI/CD in place for many different reasons, this is one of them) and I don't know how this CI/CD cron makes me "packaging maintainer and infrastructure maintainer".

"With containers, you are responsible for figuring out which ones need which packages and how to re-build them."

Yeah, you need to know how to rebuild your world in ANY CASE. It's not an argument against containers that they require an approach that is the best practice.


The "one package at a time" model assumes you care about per-instance uptime and bandwidth costs. The "reset the world" model assumes rebuilds and reboots are cheap enough not to have to.

I don't think either is necessarily wrong; there's not a huge distance between `apt-get install unattended-upgrades` and a daily (weekly, whatever) container rebuild except that you need to cron the latter. Unless you're using a different OS between the two cases, they can use the same package database, the same tooling, and so on.


You need to make the decisions, keep track, and do the work. If you've got the tools made for you, excellent. Otherwise you also have to build and maintain the tools -- which is my point.


There are no more decisions to make one way or the other. If you're keeping a VM image updated with new packages, that's either a) automated with something like unattended-upgrades, or b) a manual scan of DSA's with decisions as to whether they're important enough to apply and to which boxes they need to be applied. It you're rebuilding a container image, that's either a) a 2am cron job to do a blind rebuild that captures the latest package versions anyway, or b) a manual scan of DSA's with decisions etc etc...

The 2am cron job, if you go that way, is, assuming you've got an automated build anyway, a one-line bit of config somewhere. It's not something I'd usually think the words "build and maintain" would apply to.


If you can build on your distro of choice, you can rely on the same security audit and process. Keep track of the packages you link against and include in your image, and just rebuild your images when a new security update comes along. Then you replace the running one.


See that "keep track" bit? It's expensive unless you build a tool to do it for you -- which is what I said previously.


Pulling the list of security updates is trivial. And during development you'll know what you depend on. Comparing the two in a script is trivial.

But regardless, with containers, it should be a setup where you can rebuild your images daily with all the latest security fixes. Or whenever there is a fix you deem important.


You don't have to keep any more track than you do with your OS; it's just a different button you hit when updating containers.

Or don't keep track at all: Just rebuild and redeploy often enough.


Except the OS is one place, one org. With containers it could exponential.


Supply chain attackers love this.


The easiest analogy is that containers are like binaries. If you find a problem in nginx, you don't go on and patch /usr/bin/nginx, you uninstall that one and replace it with a binary that has the problem fixed.


It depends on where you're getting the new one from.

You could be getting mad cows every time.

I think it would be nicer if you could build containers from scratch that didn't tie into a cloud based website or someone's business model.

Sort of like a Dockerfile, but for the prerequisites. Maybe go even deeper to gentoo level.


> I’m not complaining about old-school sysadmins. They know how to keep systems running, manage update and upgrade paths.

Sadly these ways of olde didn't scale. They could only do it for a handful of servers, change request took months and the systems were not to be 'touched'.

> And since nobody is still able to compile things from scratch, everybody just downloads precompiled binaries from random websites. Often without any authentication or signature.

Right, that would have never happened back then, when packages were not signed and freshmeat.net was still a thing. For some reason compiling C code from some website (`wget;./configure;make;make install`) seems to be more secure than `go install`, maybe because nobody understands automake anymore.


> They could only do it for a handful of servers, change request took months and the systems were not to be 'touched'.

Those old ways of doing things were indeed slower than they needed to be in many cases, and I'm glad that we sped them up. I don't believe that we needed to throw the baby away with the bathwater though.

The problem now is that we build systems which are unmanageable and unmanaged. 'Throw it in a container and let it run' is not scalable either: it scales neither security not maintainability. Instead of scaling, it just punts.

Enabling one to do something one shouldn't do (i.e., deploy insecure software into production) is not a virtue.


After a few decades it's no longer a baby ;)

Software is becoming more complex and moving faster than ever. 'Controlling your dependencies' was always an illusion and a trade-off at best. Often a trade-off against security, inheriting the folder of JAR files from your predecessor and such.

I think of containers and all these new tools as designs which are supposed to help us manage existing complexity. They don't create it, it's already there in the requirements and real-world deployments. The friction we feel, is that some of those tools are not very good (yet), but maybe better than the previous generation (sometimes). On the negative side, there is a lot of nostalgia ("ah, do you remember bare metal...") and unwillingness to change and learn.


I think a lot of folks who have been through a few tech cycles just see us building the same tools over and over again adding complexity to the overall system.

I think the main cause here is, ironically, the fact that businesses keep trying to remove sysadmin (as a discipline) from the IT value chain and replace it with subscription services. Sure, you can do it, but at some point things will break and you need someone who understands what is happening at the lowest levels of abstraction.


> The problem now is that we build systems which are unmanageable and unmanaged

Are all the systems out there in production actually unmanageable and unmanaged?


You've got to look at this in a context where platform package managers like apt are simultaneously 1) platform-specific 2) jealous, insisting that every language has to conform to their way of doing things and 3) fundamentally not very good, having very limited ability to do things like install packages for a single user or install multiple versions of the same package. Platform package managers like maven have been far more successful because they provide a much better user experience, and frankly I suspect this is if anything because they're so thoroughly isolated from the platform package manager.


That's quite a negative view of the situation. The debian maintainers definitely don't insist that everything has to do things the apt-way. They only insist that packages for the official repos do so, which seems sensible enough to me. Everyone is still free to build their own deb packages or distribute their applications through install scripts or app images or any one of a hundred thousand different ways to distribute programs.

But if you want a package in the official repos you have to play by the rules they set, the primary one being that official packages ought to only depend on official packages. It'd simply be impossible to have any guarantees about the quality of a package otherwise.


Here here. And the only thing that makes Debian an interesting brand is the quality consistency of the official repository.

If someone doesn't like apt & the main repository they shouldn't use Debian because that is pretty much the only big thing separating Debian from the tens to hundreds of other distros out there. Take apt and the main repo out of Debian and it is basically Linux From Scratch with a bigger team of bugtesters. Distributions aren't 'jealous' about packet management, they are the packet management. It is right there in the name "Debian distribution". The Debian brand, and all linux distro brands, are deeply linked to how they execute packet management to distribute software.


It's not just a question of being in the official repos. Even if you're running a private repository, the deb tooling is not suitable for building many kinds of software: it's unwilling to play nice with "external" dependencies, and still single-platform and limited in its functionality.


> unwilling to play nice with "external" dependencies

But that's a policy decision - the dependencies should themselves be packages, because that's the only way to guarantee reliability.

The inability to install multiple versions is probably the only serious problem with the dpkg model. Even then we might argue that the dependent software should be fixed so that it's not so fussy about specific dependency versions.


Installing multiple versions is possible, given proper semver and good arguments for it: There are quite a few libfoo5, libfoo6, libfoo7 packages. What isn't and shouldn't be possible is installing libfoo 6.6.5 and 6.6.6 at the same time, because those should be compatible. And not being easy to replace (i.e. ABI- and API-compatible) means that security and bug fixes will be missed.

Of course this is work for application as well as library developers, designing and keeping compatibility is hard. Most would rather build the new shiny, consequences be damned.


I think these are noble but naive approaches - we have been developing software for a few decades now and sometimes the software is not compatible without the developer knowing it. You should be able to use whichever version of the software you need.


"Whichever version you need" comes with the responsibility of providing security and bugfix upgrades in a timely fashion. Almost all developers fail spectacularly at this, with latencies in the order of months. This is clearly not viable.


As we are all aware, the most secure software is software that doesn't work at all because of library conflicts.


This was my experience as a developer on a SOA team that provided APIs for the rest of the organization (a large regional healthcare system). Even changes that didn't officially break the API could break our customers in ways we didn't anticipate.

When I left that position, we were in the middle of an OpenShift (k8s) deployment, and one of the things motivating the change was the ability to more easily run separate versions of services for different customers if the need arose. Yes, this would put a lot more burden on us as the service provider, but it would also allow us to iterate faster while maintaining stability.


> sometimes the software is not compatible without the developer knowing it

Sure, but this should be considered either a defect or something highlighted by a major version number jump.


Yeah, but who decides if it's a defect? Maybe the defect is subtle and only manifests in rare cases and the upstream rightfully decides that after a risk/payoff analysis, it's not worth fixing it.

Or maybe he agrees and fixes it and it's included in distributions 7 years later.

I've changed 5 jobs in 7 years (because of life circumstances) and I'm not even a job hopper... How is any commercial shop going to plan around 7 year time frames?


Well, within a debian distribution the Debian developer responsible for the downstream package would get the dependency fixed, either themselves or by badgering the maintainer, and use the fixed Debian version. This gives Debian a consistent set of packages that work together. Yes, it's a lot of work and it's why Debian are behind other distributions and don't include Hadoop. But it reduces the unpleasant surprises.

It's effectively part of insisting that it's properly Free software - if you can't maintain your own bugfixed fork, but have to keep going to an external organisation for their version, is it really Free?


> The inability to install multiple versions is probably the only serious problem with the dpkg model.

Also can't install to separate disks.


That's unlikely. Worst case: examine where the package will put its files, and prep your system with symbolic links to directories residing on a file system on the other disk(s).

But that raises the question: why are your file systems sensitive to disk drives? Use a layer (or more) of indirection through LVM and RAID so to make the drive placement opaque and robust.


> why are your file systems sensitive to disk drives?

A couple of systems I have at home have a 16GB primary disk, so it is useful to be able to put things on an external disk. I have yet to see any package manager that can actually handle that.

> Use a layer (or more) of indirection through LVM and RAID...

Yeah, that's a great idea, just introduce more complication and abstraction to make up for a shortcoming in how Linux developers think about applications.

Or, just have your programs be self contained in a single file or folder and put them wherever it makes sense.


> it's unwilling to play nice with "external" dependencies

This baseless assertion is simply out right wrong. See for example Debian's docs on Private Package Archive:

https://wiki.debian.org/DebianRepository/Setup


Hear hear. If anything .debs dont care enough where dependencies come from. As long as the package name matches you are good. I have no complaints about debian packaging. I've used lots of different ones. All official packages are signed and its easy to sign your own packages.

Maven, on the other hand, is plain scary.


Language-specific package managers are also "jealous" as you put it. They insist that every platform conform to their way of doing things. They're also promiscuous, in the sense that most have really poor dependency management ("let's build the internet today just in case") and practically no security. The computing world would be a better place if the people forcing their language-specific idioms on the rest of the world had instead spent their time improving the platform package managers that already existed at the time.


> They're also promiscuous, in the sense that most have really poor dependency management ("let's build the internet today just in case") and practically no security.

Not criticisms that apply to maven. Binary dependencies on released versions are the norm, and all packages in the central repository are signed with GPG, just as with apt.

> The computing world would be a better place if the people forcing their language-specific idioms on the rest of the world had instead spent their time improving the platform package managers that already existed at the time.

If those platform package managers had been open to being improved, which I don't think they were (and conversely, deb/apt originated because Debian insisted on doing things their own way rather than improving RPM). Apt's closed-world assumptions seem like a policy decision. Apt not being cross-platform is definitely policy. And surely it's occurred to people in Debian that the ability to have per-user installs, and more importantly non-shared/non-overlapping installs of libraries that are depended on, would be useful; the insistence on single system-wide installs of libraries can only be a policy decision, and one that's turned out to result in a less usable system.


So, what would 'pip install s3cmd' look on Windows or macOS?


   choco install s3cmd

   brew install s3cmd
https://formulae.brew.sh/formula/s3cmd


What is the point of your question?


> jealous, insisting that every language has to conform to their way of doing things

Let's ignore for a moment all your baseless assertions and focus on the following question: how do you ensure that a build is reproducible if you have no control on which required dependencies are downloaded by a build system which requires root access?

Is jealousy the only explanation that comes up to you?


Oh, check out Nix https://nixos.org/nix/

I use it for most of my dependencies, and software in OS X/Windows WSL/Linux

It is a lovely environment to work with. Has completely changed how I feel about package management and the like.

I know you were asking figuratively but something does happen to slot in that space.


> Oh, check out Nix

Nix is a non-sequitur. Nix requires specific packages and package versions to be specified, and Nix requires the build process to be "free from side effects". That fails to address both problems stated by the OP, and actually try to reimplement what Debian's package building process already does. Other than trying to publicize Nix, your comment adds nothing to the discussion.


>how do you ensure that a build is reproducible if you have no control on which required dependencies are downloaded by a build system which requires root access?

By having control over which required dependencies are downloaded. As Nix does. Why would you settle for not solving that problem?

Sorry to hear you don't think my comment adds to the discussion, but I'd disagree


> By having control over which required dependencies are downloaded.

Again, Nix adds nothing to the discussion. The problem stated by the OP was caused by the (broken) way that the custom build system of a specific package was designed to work, which failed to adhere to the standard practices enforced by Debian's packaging process.

If the packagers of said software project followed Debian's practices then the problem wouldn't exist.

You're parroting that Nix also enables users to specify dependencies. Ok, so it works just like any other package system. What does that have to do with the problem being discussed? Nothing.


It feels like you're being unnecessarily hostile, here. He didn't address your specific problem as stated, but he did add to the discussion.

I don't think it's fair to say that Nix is 'just another package manager'. It solves many of the stated problems with distro package managers (overlapping versions, user-specific packages, strict build environment rules), and provides many of the benefits of Docker and it's ilk (perfectly reproduceable environments for packages to run, including dependencies that don't fit well into traditional package managers, like JARs). Because it doesn't rely on the standard POSIX filesystem layout, it runs happily on any Linux or OSX system, alongside whatever package manager your system uses.

If people started using Nix recipies instead of Docker or janky bash scripts for deployment of Hadoop and other complex software, most of this article's complaints would disappear. And Nix is, if anything, better from a sysadmin point of view than apt-style packaging systems. It makes a fine distro package manager (see: NixOS).


> It feels like you're being unnecessarily hostile, here. He didn't address your specific problem as stated, but he did add to the discussion.

You're right, sorry for the tone. The thing is, it sounded an awful lot like a blatant attempt at derailing the discussion by shoehorning span to promote a build tool. Nix does not solve anything, particularly as it was being proposed as a solution to a problem that plain old Debian packages do not have. So if Nix solves nothing and Nix adds nothing to the discussion then why waste everyone's time by adding noise by selling a tool that does and solves nothing wrt plain old Debian packaging?

> If people started using Nix recipies instead of Docker or janky bash scripts for deployment of Hadoop and other complex software, most of this article's complaints would disappear.

...or simply build a plain old Debian package?

Is it that hard to simply follow the happy path of packaging for Debian?

Why is suddenly Nix the only option on the table, specially as it brings absolutely nothing to it wrt what plain old Debian packages already provide for decades now?


I like Apt a lot, but it does have some shortcomings.

First, of course, it requires an apt-based distro, so software distributors need to have apt alongside all the other packaging alternatives. Or they just provide a bash script. Ew.

Second, apt doesn't elegantly handle different versions of the same package. That's rarely an issue for well-established C libraries, but it's a big issue for Java and most of the dynamic languages. So you end up with a host of language-specific package managers.

Third, there's stuff beyond simple files that falls outside the wheelhouse of apt. Networking, configuration, whatever. The paradigm for apt is very much to have a small number of systems, manually curated by dedicated sysadmins. When you start scaling up to tens, hundreds, and thousands of hosts, you end up writing and maintaining long scripts to initialize a freshly-installed system and put it in the right state. And those script will break as packages evolve. Getting a system into a known-working state is difficult.

A specific example: I installed and set up GitLab on a Debian system a while back, and it was a huge pain. It's not just a package, after all, it's web code, a couple daemons, a sql db, a redis db, git repos on the filesystem, and more. The install guide was pages and pages long. I never quite got it working right (something about SSL certs, IIRC one of the daemons wasn't using the system CAs?).

So I tried docker for the first time, and had GitLab up and running in about 10 minutes.

And if I ever wanted to migrate to a different host, spin up another node for load balancing, or do backups and restores, you bet your ass I'd use docker.

Apt is great for carefully curated, individual systems. It was perfect for the world circa, say, 2005, and the world would be better of if we'd all standardized on it then. But even if we had, somebody would've invented something like docker in the meantime, for managing complex software (like GitLab) on tens, hundreds, or thousands of hosts.

But docker has all the issues pointed out in the article above, and more besides (every image is hundreds of megs, because it contains a full, running Linux system...that's just crazy).

Nix can do the package management thing that apt does so well, and it can also do the reproduceable, holistic system build thing that Docker does. It can also make management of language dependencies (i.e. Java JARs) much more clean and elegant.

It's seriously worth checking out.


> That fails to address both problems stated by the OP

This is wrong and unfair. One of the major problems the OP has with Docker is that many software builds are unreproducible, encouraging many people to deploy binaries of dubious quality. Nix tries to solve precisely this problem through better tooling which makes it easy to ensure that its packages are reproducible and its dependencies easily verifiable. This is also what makes Nix distinct from Debian, which tries to improve package quality through policies and community collaboration.

> Nix requires the build process to be "free from side effects"

I disagree that this makes Nix irrelevant to the discussion at hand. Another gripe the OP had with Docker is that it sandboxes entire apps, making it a blackbox. Nix does sandboxing on a more granular level, which provides more transparency into individual packages.


Reproducible builds were a non-goal for apt for most of its existence.


> Reproducible builds were a non-goal for apt for most of its existence.

Your comment sounds very disingenuous, as Debian was first released in 1993 (about 26 years ago) and has been pushing for reproducible builds since 2000.

https://wiki.debian.org/ReproducibleBuilds/History


Your own comment is extremely disingenuous. Your own link states that there was very little enthusiasm for reproducible builds when mentioned in 2000 and 2007, and the first serious effort towards reproducible builds started in 2013.


That doesn't dismiss its immense benefits. Besides, what's the source of your claim? Package quality, ease of install, and security were always high-priority topics for Debian AFAICT.


Decades of experience using Debian. Package quality and ease of install were indeed always a priority (and security may have been claimed as one), but reproducible builds were not seen as necessary for any of those.


you can use a tool like snap or appimage to pull down the newest full release of your software.. at least then, all the dependancies are 'bundled' into the image, and your not having people go randomly install crap from around the internet with no real way to update.


Building debian packages from source definitely does not involve "randomly installing crap from around the internet".

The .sdeb format packages up sources and build scripts into one atomic, build-able unit.


The only .sdeb archive format I know of is the one for debbuild[1]. Most Debian package build mechanisms don't have a source package archive format.

[1]: https://github.com/debbuild/debbuild


Jealousy is the characterisation of exactly that behavior/aspiration to have everything managed through the OS's package manager. It's a nice effort, but for people not firmly within the ecosystem it's often a hard effort to justify to package something specifically for every distribution.


> 2) jealous, insisting that every language has to conform to their way of doing things

Well, programming languages clearly should not be in business of software distribution. That creates unnecessary tight coupling between language, build system and distribution, causes proliferation of language-specific package managers incompatible with a platform way of doing things.


Something Java and the Java ecosystem is terrible at. I've had project where we needed all of 7,8,9 and 10 [0] for the data stack: Hadoop, Scala, Spark, HDFS, zookeeper, pyspark. Good luck setting that up on a single machine, containers all the way. The java path nightmares I still have. I'm probably one of the few people in the word who know this much about the JVM without ever having written any Java.

[0] We should have also been following 11 but just didn't have the resources to even think about doing it.


> I've had project where we needed all of 7,8,9 and 10 [0] for the data stack: Hadoop, Scala, Spark, HDFS, zookeeper, pyspark.

I suspect that a big reason was due to using Scala, which is notorious for not maintaining binary backward compatibility across minor releases.


Scala breaks binary compatibility approximately every 2-3 years, which is fast by JVM standards but slow compared to most programming languages (e.g. Python releases an incompatible version every 1-2 years). You can blame Scala for at most one incompatibility in that long list; older versions of Scala are incompatible with Java 9+ (this is true for many JVM frameworks too, as Java 9 made breaking changes).


Scala's version is epoch.major.minor, not major.minor.patch. I though 2.12.(x+1) is generally binary compatible with 2.12.x. Or at least I personally never ran into that issue with Scala itself.


To be fair, most languages don't maintain backward binary compatibility.

In fact, this is so common that some package managers don't even bother distributing binaries: go modules, cargo, conan, etc.


Not so obviously as you might think. But, if they are going to do it, they should clearly do it well


Right, but my experience is that Java (and a few others) have succeeded in this space. I actually wish that I could use maven to install system software rather than having to rely on something like apt.


I am intrigued by the experiences you have had which lead you to precisely the reverse conclusion to mine.


I wish that I could do literally anything else in my life but use maven.


Wow. My experience is that Java got it spectacularly wrong, with different tools stepping on each other constantly.


No I don't. I use Debian derivatives because I can apt get all my stuff without thinking hard, because those package maintainers have done the hard work.


Well, clearly not, since you can't install Hadoop. Before you blame that on Hadoop, remember that it only requires some really quite basic things from its package manager, which apt is nevertheless completely unable to do: use libraries in the normal recommended way, play nice with Java, work cross-platform.


Whatever. (Great practices those, being stuck in a decade old JVM and requiring root to build! But well, whatever.)

If I can not install on Debian in a clean way, I will just try to avoid using the software (and everything that comes on the ecosystem). It may be unavoidable, but if there is an alternative, it will be preferred. If one alternative appears after I am dealing with it for some time, it is still preferred (because the work with out of distro software never ends).


> Well, clearly not, since you can't install Hadoop.

You are complaining that you can't automatically install broken packages out of the box through the official repo.

And packaging an application is the responsibility of the people working on that application, not the OS.

What's your point?


An OS is a tool for running applications, not vice versa. Packaging an application should indeed be the responsibility of people working on that application, which is why language package managers like maven are successful. Debian explicitly rejected this philosophy in favour of one where OS maintainers are responsible for packaging all applications (thus the OS-specific packaging format and closed-world assumption of their package manager).


> Debian explicitly rejected this philosophy in favour of one where OS maintainers are responsible for packaging all applications

That assertion is quite wrong. Just because the OS has its official package repository, and just because OS maintainers volunteer their time to package some software projects, obviously that does not mean that maintainers are responsible for anything. You are confusing offering a convenience service with being responsible for packaging each and every software under the sun.

In fact, your baseless assertion ignores two basic facts: packages are proposed and adopted by volunteers, thus what you've described as "OS maintainers" is pretty much any random person who simply wants a software to be available for download in the distro's official repositories, and packaging systems such as Debian's apt supports private package repositories, where anyone can make available their packages to the world.

Another fact that you missed is that build systems such as Maven or msbuild or cmake or Gradle or whatever do support download packages only for a reason: convenience. It has absolutely nothing to do with where lies the responsibility of providing packages. It's convenient that a build fulfills all build dependencies. However, the responsibility of packaging and distributing packages of a software product lies exactly where it always was: those working on the software product.


> packages are proposed and adopted by volunteers, thus what you've described as "OS maintainers" is pretty much any random person who simply wants a software to be available for download in the distro's official repositories

The entirety of the Debian project are volunteers, random people who simply wanted xyz. And Debian's raison d'etre is as a "distribution" of existing software.

More to the point, Debian packagers explicitly overrule "upstream" application developers; see the history of cdrecord or the ssh key generation bug for particularly spectacular examples. How can you say it's the responsibility of those working on the software project when they don't get to make the final decisions?

> packaging systems such as Debian's apt supports private package repositories, where anyone can make available their packages to the world.

Not really, because apt's approach to dependencies requires a central naming authority. In practice any private repository is necessarily a dead end: you can package additional software that depends on software from the central system, but not vice versa.


For Linux distros, the primary responsibility for packaging is actually with the distro maintainers, since packaging is distro-specific.

That said, a well-written app should not present any difficulty for the package maintainer.


> For Linux distros, the primary responsibility for packaging is actually with the distro maintainers, since packaging is distro-specific.

No it is not. Although distros might pick and adopt some packages and include them in their official repository, it's obvious to anyone that as a distro maintainer you are not responsible for packaging each and every software package under the sun.

Moreso, linux distros such as Debian rely on volunteers to propose and adopt packages, which obviously also includes people affiliated with each software project.

Additionally, it's patently obvious that the responsibility of packaging and making a software available to the public lies on the people involved in developing the software project.

> That said, a well-written app should not present any difficulty for the package maintainer.

That is true, particularly as the primary package maintainers are in fact those actually developing the software project.


How many packages in the Debian repos are packaged by their authors, rather than maintainers that are affiliated with the Debian project?

(I'm including volunteer maintainers in the latter category, unless they submit the package to the original author rather than directly to Debian.)


I don't need Hadoop because I have find, xargs, GNU parallel, Slurm etc etc


This is a very negative view.

Once you get used to this method of packaging it’s actually really empowering for the end user. You can download the source for anything in the system, modify it, recompile, and install it all with a standard set of commands (no one-off custom build steps for every package).

I feel it’s the closest thing to the spirit of “free software” we have today: software designed to empower the end user to read, understand, and modify their system without a huge amount of obfuscation.

It increases the burden on the packagers to reduce the burden on the users. I would be interested to see how often these features actually get used though. I wish distributions would put this front and center.

https://wiki.debian.org/WhyDebian


In my opinion there’s simply too many dependencies to begin with. Some of the simplest things put into a GitHub repository to be “shared” with the world is great—as a gist file. Software needs to be a little more self-contained. Software reuse these days is honestly taking “not reinventing the wheel” to lazy extremes.

System package management systems do things just fine. Ask yourself, “when did I last properly package a Debian library? Do I even know how to do it properly?” Most people won’t be able to answer that because they didn’t.

But truth be told, having each programming language a unique package manager sucks. I’d rather have a consistent way of managing packages on a given system so that I can use various programming languages.


> Software needs to be a little more self-contained. Software reuse these days is honestly taking “not reinventing the wheel” to lazy extremes.

The three cardinal virtues of a programmer are laziness, impatience and hubris. I've never seen a decent argument against using ever-smaller libraries (as long as our dependency management is good enough to track them); all of the usual arguments for reusing libraries still apply, even if we take them further than their originators imagined.

> System package management systems do things just fine. Ask yourself, “when did I last properly package a Debian library? Do I even know how to do it properly?” Most people won’t be able to answer that because they didn’t.

Well, as we're seeing, Debian is struggling to keep up with packaging of everything that users want. I have actually created some (unofficial) debian packages; it's fine, but it's not a particularly inspiring experience either. Creating a maven/pip/... package is generally a nicer experience.

> But truth be told, having each programming language a unique package manager sucks. I’d rather have a consistent way of managing packages on a given system so that I can use various programming languages.

In principle I agree. But I'd definitely need to be able to install independent copies of the same library, different packages for different users, and fundamentally just have a nice development experience when working on that package manager. And that's not something I see the sysadmin tradition being able to come up with.


>In my opinion there’s simply too many dependencies to begin with. Some of the simplest things put into a GitHub repository to be “shared” with the world is great—as a gist file. Software needs to be a little more self-contained. Software reuse these days is honestly taking “not reinventing the wheel” to lazy extremes.

All true. Did we learn nothing from the left-pad incident?


But platform managers like maven are 1) language-specific, 2) jealous, insisting that every OS and extra-linguistic dependency has to conform to their way of doing things, and 3) fundamentally, not very good, particularly with regard to multiple versions of the same package (technically, OSGi could do that...), repeatable builds, and the desire for do-what-I-mean behavior.

Platform managers have been successful because a) they have the same interface across operating systems, b) they're the only game in town if you want to work in that language, and c) everybody loves to re-write existing software in their favorite new language.


> 2) jealous, insisting that every OS and extra-linguistic dependency has to conform to their way of doing things

Not really. Language-specific package managers are usually capable of integrating with "system" dependencies from outside the language (certainly maven has decent support for this) - they kind of have to be after all.

> 3) fundamentally, not very good, particularly with regard to multiple versions of the same package (technically, OSGi could do that...), repeatable builds, and the desire for do-what-I-mean behavior.

How so?


Maven is just as opinionated, jealous, and terrible, but its impact is restricted to JVM packages, mostly (hopefully) in the development stage, and so the consistency issues don't come up.

I ultimately agree that the system package managers do a poor job of wrapping language-specific package managers. But you lost me when you suggested Maven has a good user experience compared to most OS package managers. If that was a sly joke, then it was a good one.


> jealous, insisting that every language has to conform to their way of doing things

Not at all. The large majority of packaging systems have reasonable and similar requirements for upstreams, like not bundling dependencies or not hardcoding paths.

> fundamentally not very good, having very limited ability to do things like install packages for a single user or install multiple versions of the same package

That's completely by design, and for good reasons.


While you could argue about:

> install packages for a single user

because of the security angle, Not allowing this:

> install multiple versions of the same package

is completely indefensible.

Who are you, package manager, to decide that I shouldn't be able to use different versions of the same package? Do you know more about my context than I do?


> Who are you, package manager, to decide that I shouldn't be able to use different versions of the same package?

That's one of main point of a distribution, rather that simply throwing software on a hard drive.

The maintainers have to work hard to guarantee that a specific set of packages and version work well together. Well enough for most users to deploy 99% of the packages without surprises.

And then backport security updates (often faster than upstream), for 3 years, often 5.

And various companies provide longer term updates and maintenance.

Oh, and provide license checking and vetting.

Turns out that it's a lot of work and no serious distribution would consider supporting multiple version of packages while guaranteeing the same level of quality.

> Do you know more about my context than I do?

Given your statements about distributions... most likely yes.


> Given your statements about distributions... most likely yes.

Hello there, ad hominem.

Linux distributions are overreaching. They should offer those services for core software. I should be able to install Inkscape 0.91 and 0.92 at the same time, easily.


I don't see the security angle either. You could design the system in a way that makes it able to force updates into the individual user's broken packages. Heck, it shouldn't even be connected to a user - it's just that every user should be able to install a package, and what install might mean is

  * compartmentalized side-effects (not per-user even -
    users might decide they want multiple compartments for 
    different apps) and 

  * a shared storage area for all the packages and their
    versions (i.e. Nix)


The design results with flawed user experience and the users will move past the restrictions one way or another. It's up to system package managers to realize why the old way of doing things is simply not good enough; otherwise, people will just move on.

I'm sad that this is not happening since in the process we're also losing the good bits of system package managers as per this article.


> Platform package managers like maven have been far more successful because they provide a much better user experience, and frankly I suspect this is if anything because they're so thoroughly isolated from the platform package manager.

This is just wrong. Package managements support simple models by design so you don't end up with a mess of conflicting and potentially insecure dependencies in deployment environments.

Use maven in your build pipeline, fine, but distribute a minimal package using tarball/RPM/APT/etc. so that only the minimal runtime dependencies install on the deployment target. In fact, I built my first RPM using the maven rpm plugin. It takes just minutes to set up and use.


This all boils down to trust.

`curl | sudo bash` is no different than .\install.exe. The question is about trusting the SOURCE and trusting the DISTRIBUTION channel (that HTTP download from scala-lang.org violates this).

Where did you get it? from https://microsoft.com/... or from https://micro.soft.com/...? Whom you trust more? The same with pre-built VM image or whatever... do you trust the party that made this image/container available?


Trust, but verify.

> `curl | sudo bash` is no different than .\install.exe

.\install.exe runs a binary you already have. If it's windows, you can see if it's a signed binary, and it will prompt you for admin access. If it's not signed, you can compare a signature of it to one you got from a secure source.

`curl | sudo bash` downloads the file from a remote source. The URL isn't listed here; was it http? In that case, now a MITM can modify the file before you run it. If it's HTTPS, that's better, but you still haven't compared it to a signature of the original file, meaning it could have been modified on the download site. And hopefully you're one of the seemingly few users that turn on password prompting for sudo access (and hopefully running it in a new terminal, to avoid an earlier sudo's session...)

So they are different. And containers and VMs are the same: if you can compare checksums before running them, or build them from source, you have confidence that they came from a source you trust. Dockerfiles in particular make rebuilding even more trivial than with source Linux packages, so there's not much reason to fret.


It also no different from using 'make install' or installing packages from a community repository, like Ubuntu Universe, homebrew, Arch AUR and so on.

There is an absurd amount of trust put in distributions and their package managers. Unless people are paying for it, or use very old software, they probably don't get the level of security they expect.


On the contrary, it's much worse. There's no way for 3rd parties to collect, test and redistribute installers in an organized way.


`curl | sudo bash` should never have become a thing in the first place. To me, it's the most "WTF?" mode of installing anything. Especially when the source is HTTP.

In what world is that secure?

Pity Qubes is so heavy, otherwise I'd be using it


would you feel better if your Makefile looks like this:

    install:
        [ $(id -u) -ne 0 ] && echo 'please run sudo make install' || curl https//my.thing | /bin/sh

it goes back to what jve said - it's a matter of trust. how often do you blindly run 'sudo make install' w/o reading the entirety of the build script? all the time I bet - it's because you trust the source


No.

I think I have never run `sudo make install`. Things I install come from package managers. Docker or a VM is used to test other software.

And of course trust is important, but if I encounter things like `curl | bash`, my trust is lost.


What difference does it make if you run `make install` or `curl | bash`? In both cases, you are running code you have not audited yourself. Or are you the kind who inspects every Makefile and installer script before installing? If not, why is one better than the other?


Because at least with make install you can have some trust in the delivery method, ie encrypted git or whatever. If you use just plain http....


So wget https://example.com/installer.tar.gz; tar -xvzf installer.tar.gz; cd installer; make install is okay for you?

But curl https://example.com/installer.sh | bash is not? Why?


I'm fine with the second too as long as the host is trustworthy and it's always https. Most of the time it isn't.


Whether it's `sudo make install` or `curl | bash`, in both cases you have the opportunity to first inspect the code that will be executed.


You don't have the opportunity in the latter case.

https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-b...


Sorry, I don't follow.

If I have a makefile, I can inspect it and see what it does.

If I have shell script (or indeed a makefile!) that calls `curl | bash`, I can inspect that shell script and see the URL that is used with curl, and then inspect the contents that the URL returns.


You can't see what comes from `curl | bash` before you actually pipe it to bash, click the link and read the article please.


OK, I see - in theory, an attacker in control of the backend could write a handler that could craft a bash script that writes different content when `curl | bash` is used.

TBH, while I take your point, I do think it's a little disingenuous of you to claim that "You don't have the opportunity" to inspect the script prior to executing it - you ordinarily will, but can't in the unlikely event of an attack like the article describes, which would require an attacker to be in full control of the web server.

Off the top of my head, this could be mitigated in a couple of ways:

1. Hash a known-good script and check the hash matches prior to executing (this does however mean that you need to update the hash every time the remote script is changed)

2. Use curl to download the remote script to a local file first, and provide the opportunity to inspect it prior to piping it into bash


> Off the top of my head, this could be mitigated in a couple of ways:

Or the third opportunity of not piping curl to bash and using a proper repository that has all these integrity and authenticity checks built-in.


Basically all makefiles you run into in the wild for mature projects allow you to install it with a prefix. e.g. into the home folder. No need to sudo anything if you do that.


This is an important point lost in the noise.

The problem is that cattle should apply only to containers, not hosts or VMs. Why? root. Which is also why Docker is a nonstarter for anyone who wants to use containers securely.


The holy grail of sysadmin has always been deployment of immutable images, instead of host system pollution, source code mutation during deployment ...

It was doable with things like Packer, KVM or OpenStack, now it's become easier with containers. You can still orchestrate your stuff with ansible or bash scripts.

I most certainly remember how I was doing isolated deployments 10 years ago: https://github.com/jpic/bashworks/tree/master/vps HINT: it's much cleaner and much more automated with containers: with GitLab-CI it's trivial to host a private container bulding infrastructure with a registry and other automation ...


Talking from a developer's point of view, I've thought about this for a while and I think it has something to do with what I refer to as "anti-imposter syndrome" - the notion that anyone can make software. You see, over the past 10 years or so, self-proclaimed "educational" services/websites/institutions have been shoving down everyone's throat the idea that anyone will be able to create the next FAANG from scratch after just 3 months of training. Which is the same as claiming that I can become an F1 driver in 3 months. Where do I sign up?!?!?! The thing is, F1 results are a lot more visible since you have a point of reference - the top dogs. Chances of scoring a fastest lap or coming even close to them - little to none. Software - not so much - you have a shiny interface and what it does underneath is always a mystery to the end user. And people who have gone through those magical 3 months of training are often lead to believe that the way they are doing things is what everyone is doing and that is how it should be done. Often people who have signed up for those courses are people who have just a smudge above average technical knowledge - they have no idea how OS'es work, what should be considered safe or even why. Don't hate me for saying it, but essentially Windows users. And this is the software they end up building and distributing. 10 years down the line, thousands have picked up random scraps of knowledge from here and there and tried to mash something together. Don't get me wrong, I think technologies like Docker are astonishing and an incredible tools in the hands of people with knowledge and experience. However way too many people have picked up some scraps from there and created the unholy mess the author is talking about.


I recently had a similar discussion with people using npm for building a CSS framework library. I tried to explain the concept of getting a pre-downloaded tarball and using "make" (or similar) to produce target artifacts from source files in a deterministic, repeatable and reliable manner, without relying on any third-party servers being available and without pulling in dependencies that might have changed.

It seems the concept was entirely alien to programmers younger than me.


I know you're taking some flack for making it about age, but I do think there's some merit there. I'm 34. I feel like I'm in the middle in terms of software developer / sysadmin age. Lots of bright minds came before me and there are lots of bright minds out there right now in their early 20s.

A key difference is I can remember a time when network connectivity was flaky. When it was hardly a given. Even when it was available, the download times alone could often constitute a large part of the overall build time.

I think that we have all become complacent with regard to internet connectivity and service availability, but I think the younger you are the more complacent you are likely to be. If github.com goes down entirely, let's be honest - there are a lot of Jenkins builds that are going to be in the red.


I'm 36 and recently was assigned to mentor a new employee in his 20s. We had a moment of miscommunication when I asked him to use git to clone a local repository. He was confused when he couldn't find it on github.com (what I'd sent him was a path to our private network share). I had to explain that, yes you can use the github.com client if you want, but "git" is different from "github". I honestly couldn't tell if he understood the difference.

Now I'm losing sleep worrying if there's any way he can accidentally add a github.com remote to our private repo and push to it. I would be blamed, and I'd have to explain to managers older in turn than me what both git and github are.


I've worked with developers that have used git for 10 years who didn't fully realize what all of the 'git reset' options entailed, and I don't blame them. Git is complicated and you could certainly have a perfectly effective workflow with it for your whole career without using most of the features. If you'd only worked in environments in which code was entirely managed in GitHub, you probably wouldn't know that either. Judging someone because they don't possess the same slices of implementation-specific knowledge you do doesn't really make sense. They almost certainly know things that you don't simply because you've never encountered situations in which you had to learn about them, or for that matter, remember them even if you did.


> I've worked with developers that have used git for 10 years who didn't fully realize what all of the 'git reset' options entailed

While knowing some of them, if you use git reset on a frequent basis is expected, knowing where to find information on the other options is essential. I frequently go back and read through the man pages for various git commands so that I understand what will happen if I use a particular set of options and also to learn new things while reading through them.

So, the proper answer for a developer who doesn't realize what a git command is capable of is to refer to the man page for that particular command.



I believe it's more beneficial for developers to learn the underlying model of how git works (blobs, trees, commits, tags, etc) and how various commands deal with those underlying concepts. The Pro Git book git internals chapter is a very good place to start. That, in combination with the man pages for the commands they go over, will greatly enhance one's understanding of how git works.

Relying on cheat sheets to learn how to use git is not much different compared to learning how to use a programming language via Stack Overflow. In other words, you'll never develop a more thorough understanding of how the tool works and how to use it effectively.


Right. Assuming that they're not competent because they can't recall a slice of domain-specific knowlege from memory is not good.


I am the same age as you, but I do not think this is about age, at least not mainly about age. My father who is almost 70 has no issue understanding what git and github are and I have worked with people under 25 who have not had an issues with this distinction either. And there are plenty of open source developers I have met who are pushing 70 who keep up with technology just fine.

Sure, I notice that younger developers do not know some things, like they never experienced the Java EE hype, so they can fall into some traps which are well known among older engineers.


Yea this seems like more of a competence-related problem than age-rated. I never even considered that I might have to ask candidates if they knew the difference between git and github but now I wonder....


> I had to explain that, yes you can use the github.com client if you want, but "git" is different from "github".

To be fair, this is largely the result of a concerted effort by GitHub to muddy the difference. If you didn’t know any better, you might think Google is the internet too.


That's funny. That same situation (not understanding that git and github are different things) has happened to me a couple of times but with older engineers.


People adapt to the situation they experience. Github (and other repositories) tend to be stable, so they are used. Network speeds have increased, so we use the network and expend less efforts on local caching etc.

There's nothing "complacent" about this: previous generations also relied on infrastructure and didn't plan for prolonged power outages or had backup ham radio network links for when AOL was down.

People using nom install instead of custom makefiles aren't ignorant or stupid, they have found better ways to achieve their needs. And if 10 years on the job don't create the need to learn some skill, there is no reason to invest time into it. And I have complete confidence that people would be able to come with some workable solutions rather quickly if the githubcopalypse ever happens.

There is some cultural component at play among the "luddites" here as well, maybe comparable to preppers? It feels like planning for really exciting emergencies when one's skills that have been derided for so long are suddenly needed and safe the day. In this analogy, I guess Makefiles are the equivalent of very masculine hunting and zombie-defending skills.


I half agree with you, and am chuckling a little bit about the "very masculine hunting and zombie-defending skills" part, but... I'd like to offer some perspective.

I'm 36 and work on a pretty broad set of consulting projects: some schematic/PCB/mechanical design, firmware, some lower-level desktop/server code, and up and up to web/mobile apps. "Full full stack" if you will.

I live in a "major" Canadian city (although not in the top 15 by population), and I also own two wonderful properties about an hour out of town in a quite rural area. One is a cabin on a lake, and the other is a church from the 1910s. Sometimes I head out to one of these places to do the "Deep Work" thing, distraction-free, and sometimes it's to take time off, but end up getting an emergency call from a client. In either case, my Internet connectivity is limited to tethering, and depending on a few factors, that can either work fantastically well or poorly.

Going from the lowest-level to the highest-level projects, there's a very clearly declining probability of the project being able to build during a low-connectivity event. The embedded stuff pretty much always works just fine (it's a Makefile, or CMake). The C desktop/server stuff? Always works fine (any dependencies were pre-installed). Python/Ruby/Elixir web backend projects usually go OK, although I've occasionally ran into issues where the package manager wants to check for updates. Node front-end builds sometimes start to fall apart, and Android (via Android Studio) often refuse to build at all! (Some kind of weird Maven/Gradle thing that needs to go out and check something, even though the dependencies have all been pre-installed...)

It's extraordinarily frustrating when you can't change a line of code and hit "Build" to test a change locally. Everything's already present on the machine! It worked just fine 5 minutes ago!

To your prepper comment, and the previous comments about infrastructure, there's a significant population of the world that doesn't have 100% reliable infrastructure, even in Canada and the US. The tools we have used to work just fine in that environment, but are getting progressively worse.


> Some kind of weird Maven/Gradle thing that needs to go out and check something, even though the dependencies have all been pre-installed...

It is often possible to tell Maven at least to work in offline mode and not check for dependency updates.


> There's nothing "complacent" about this: previous generations also relied on infrastructure and didn't plan for prolonged power outages or had backup ham radio network links for when AOL was down.

A lot of the protocols for asynchronous communication allowed for operating in offline mode. So if you didn't have an internet connection, you could still compose and send emails, but the client would only actually connect to the network when were actually connected and send the emails all at once (as well as downloading emails from the POP or IMAP server).

git actually has commands that leverage email for sending an receiving patches, so that code review and development can take place without requiring a connection at all other than to send and receive when needed.


> A key difference is I can remember a time when network connectivity was flaky.

I’m young enough for this to not be something I have experienced, but I don’t have to have to understand that depending on things you don’t have control over can be a bad idea.


> without relying on any third-party servers being available and without pulling in dependencies that might have changed.

There are two different issues here:

1. Not pulling in changed dependencies. This is what "lock files" are for: To limit builds to known version of every dependency. npm was terrible about this for a long time. Most other language package managers are better.

2. Not relying on third party servers to be available. Personally, I've mostly worked with Ruby's bundler and Rust's cargo, and in over 10 years, I've lost maybe two days of work because of package server outages. That's less than I've lost due to S3 outages, less than I've lost due to broken backup systems, and less than I've lost due to complex RAID failures. For the clients and employers in question, this was acceptable downtime.

In the rare cases where one day of noticeable build downtime every 5 years is unacceptable, then it's usually possible to just "vendor" the dependencies, usually by running a one-line command.

For many small to midsize businesses (and many growing startups), a small risk of brief outages is an acceptable engineering trade-off.


> npm was terrible about this for a long time.

That's putting it lightly. To me "terrible" would mean they just didn't support it, "batshit insane" means they supported lock files but ignored them every time you ran npm install.

My favorite comment from this stackoverflow post: https://stackoverflow.com/questions/45022048/why-does-npm-in...

"Why would you expect something called package lock to lock the packages? Package lock is analogous to how, when you put any key into a door lock, the lock reshapes itself to match whatever key was put in, then opens the door. Now if you'll excuse me, I'm late for a tea party with a rabbit."


I think one of the scenes with Humpty Dumpty could be an appropriate allusion as well.


At work we also cache all 3rd party dependencies locally. Not because the 3rd party servers might be down but to be sure that one/two/... years from now we can still recreate the same software we delivered at that time to a customer. Our build machines typically didn't even have internet access. If a dependency was not available offline (due to developer error or whatever) we would know quickly.

In certain industry branches this is even mandatory if you want to be seriously considered as a supplier. If a build tool does not support reproducible builds in such a way (both fixing dependency version and getting it from a cache somehow) or makes it difficult then it is considered as a hobby toy that has no place in the workplace.

Even for small businesses I would advocate to take this seriously from the beginning. It's not that hard and will save you headaches later on when suddenly reproducible builds become important.


> At work we also cache all 3rd party dependencies locally [...]

Can you say a bit about your platform and tooling? Are you working in a single language or a polyglot world? Is the cacheing at the network level or are your build tools aware of your mirrors?

I work in a space (biotech/pharma/...) that shares these concerns. I've solved the Perl specific version with Pinto (https://metacpan.org/pod/Pinto) and the more generic version with Spack (https://spack.io) [which is neat also because it supports installing multiple versions of applications, doesn't require root, <other things>].


>I've lost maybe two days of work because of package server outages [...] For the clients and employers in question, this was acceptable downtime.

Even if the downtime is fine, I don't find it goes over too well if framed as "we rely on a 3rd party service, that we have no contract with nor any guarantees of reliability or product longevity."


npm has only gotten worse about lockfiles over time, not better. I wonder out loud some times if they know what the word 'lock' means.

If yarn would fix a particular bug that blocks our workflow I'd be burning political capital at work to get us off of npm as fast as humanly possible.

As to proxies, we have something misconfigured with ours, such that occasionally it gets latest of half of the React or Babel ecosystem and latest-1 of the other half, resulting in dependencies that can't be resolved for a few hours when they increment a minor version somewhere.


There are other advantages such as build speed. Having local copies is simply faster.


Truth be told the whole front-end ecosystem relies on those servers being available, so in their absence any development everywhere would grind to a halt - as it almost did during the left-pad scandal.

While I agree that this is not exactly the sanest approach, within the ecosystem there's no incentive to work differently.

Also, like someone else mentioned - dependency hell only got worse over time - setting up a new project you're likely to have several versions of the same library in your node_modules.


> Truth be told the whole front-end ecosystem relies on those servers being available, so in their absence any development everywhere would grind to a halt

There's a world out there of LAMP/LEMP stack web developers that wholeheartedly disagree with this perception.


There’s a pretty straightforward solution to this problem which is to run your own NPM server even if it’s only a caching proxy. For some orgs, the uptime provided by the third party servers is sufficient.


Having been at 3 companies that did this, I can say with certainty that our caching proxies (artifactory in all of them) were much more likely to go down than the public repos were.


Yeah, I think I've lost a day in total when our local npm was finicky on several occasions or failed to fetch latest version of a package that we've developed in house, but made publicly available.


They just don't have to be down at the same time to be useful.


Yes, but at least in that case you have the fallback of the public repos.


Maybe it's obvious and straightforward enough that other people have come up with the same idea, tried it and not found it to be worth the effort?


I like using Artifactory for this, you keep a local copy of all NPM / nuget / etc dependencies hosted next to your build infrastructure.

What shocked me most about NPM is that it used to have absolutely 0 verification built in, yet it was being heavily promoted by very well known, educated and experienced tech celebs. All at a time when it was basically a hobby toy.


Last time I tried to use Artifactory for it's caching it barely worked. Would randomly 404 when a package was not in it's cache yet and you'd end up retrying a CI build for a while until everything was there.


At work I'm running ~2000 builds a day across 14 repos, mix of c# and php. We're proxying several hundred NPM, composer and nuget packages on 2 artifactory containers running in high availability. It was a rocky start but about 2 months in everything was pretty stable.

If you're still apposed to artifactory you could try sonatype nexus


To follow up: I ended up building my own Docker images which contained the build tools (also known as the npm burning dumpster tire fire with extras thrown in), source code and all dependencies. This allows me to get reproducible builds, the modern-day analog of "./configure; make".


It can be done using package managers too, for example using Nix. The problem here is not related to distribution method but rather dependency hell in javascriptland.


I would say the main issue is that Nix is super difficult and requires you to intimately know your entire dependency tree down into Linux particularities. JavaScript makes matters worse, but even packaging a nontrivial Python package pulled directly from Pypi is often difficult.

I want to like Nix, but it is far too pedantic to be practical. And this isn’t even mentioning the usability issues.


I think you only need to specify your immediate dependencies in nix?


Someone has to write the package definitions for those dependencies, and the public package repository doesn’t have broad coverage for many popular packages in many languages. This is understandable in that this is a massive effort, but that’s also the point—the effort to manage packages is huge and the extra effort compared to other tools is not a good bargain for many projects, nor is it much of consolation for project maintainers who would otherwise like to use it.


18yo JS/Ruby/Go dev here. `make` is certainly known to us; it's just that platform-specific build systems (`mage` for Go, `rake` for Ruby, `gulp` for JS etc.) are simply more sane to set up per languages than using `make` for all languages and allows for efficient code reuse.


You have make call the platform specific build systems.


What's the point in a project that only uses a single build system? You're essentially getting nothing out of make except indirection. Might as well just have a build.sh at the root of the repo.


> You're essentially getting nothing out of make except indirection

The thing you're getting is some familiarity, I think (because of the indirection). It's sort of like using a `package.json` for the scripts section, but you get to have comments, and there's less other random stuff in there.

I definitely understand why many would think this is over-kill though. A proper readme is probably just as good/better.


familiarity, documenting the process for building, and last but not least, access to the entire unix toolchain.


I have done so in the past. This leads to a lot of duplication though.


I use it mostly so when I come back to the project after months, the Makefile is a reminder of how to build it. Also autocompletion.


Wouldn't it just be like a build.sh? And makefile is not cross platform enough, especially on Windows.


I can't speak to the others, but if you've installed rubygems on Windows, you've installed mingW, and you have access to those tools. This is because rubygems will not build source gems against microsofts toolchain, but against the gcc toolchain ala mingW.

If you prefer using Powershell on Windows, great (or bat if you're a masochist). The point was that being in Ruby doesn't automatically invalidate the use of make as a build tool.


Here is how to use tools on Windows via my chocolatey packages without WSL, mingW etc.

    cinst ruby                        # install ruby
    cinst msys2 --params "/NoUpdate"  # install msys2 without 
    system update
    Update-SessionEnvironment         # refresh environment vars
    ridk install 2 3
See notes on https://chocolatey.org/packages/msys2


Yeah, maybe it is true for Ruby development on Windows, but not for others, says Node.js. Developing Node.js on Windows is (comparatively) easy to do so, usually you only need to install `node` and `npm`, then optionally install `yarn` and `windows-build-tools`. None of these steps will give you GNU toolchain, so it won't available until you install it.

That is ehy I prefer OS-agnostic tools such as npm or npm+gulp for more complex build tools.


make IS an OS-agnostic tool, it's just not available by default on Windows. But it can absolutely be installed. Or you can bring the mingW/ binaries along with you. Even something as simple as installing git for windows will give you access to a decentish unix environment.

staying inside rake, et al is fine if your software is simple enough to get away with it (by simple I mean self-contained without too many moving parts). But quite often you can't get away with that, and make becomes a good choice.

I would also argue that if you're developing ruby on windows you're doing it wrong. You can do it, but I would personally never target Windows for a ruby app. been there, done that, have the scars to prove it.

I would also add that installing all of that for node doesn't really seem to be simpler than using make, but that's just my old timer sensibilities coming into play.


I don't think it's specific to just younger devs, but there absolutely is an argument for having your own artifact repository that you can control.


I sometimes have problem doing `./configure && make` because dependencies varies across different distro, which have some subtle differences that make it not easy as expected.


If you can do `./configure && make`, you can also do a `checkinstall` on debianesque systems, which gives you a package so you don't have to do it again.


checkinstall does rpm/slack too (or did), not sure how debianesque they are.


True. But at least it doesn't auto-download half the internet (with new and improved vulnerabilities of the day) and dump it all in your "project directory" (whatever that might be, I'm still fuzzy on the exact definition).


But it is consistent, which is very important.


It can be sold as a new invention in a few years.


>building a CSS framework library

I think that might have been the root cause of the problem.


I don’t think it’s age so much as it’s “well, that’s just what everyone does now”.

If you want to work in this industry you have to follow the herd; for better or worse.

I certainly don’t agree with it, but people have bills to pay.


What if "well, that's just what everyone does" is just wrong? I trimmed the word "now" from the quote because lazy sloppy short-term thinking is nothing new. Good configuration management takes a lot of work and pays great returns, only not in the very short-term. It's not surprising that inexperienced devs will fall into this trap.

And just why do you "Have to follow the herd"? Pretty sure just about everything new and creative (good or bad) was a result of not following the herd. I recommend going where you need to go. If a herd starts following you, that's great. At least they'll be slipping and sliding on the shit you leave behind, not the other way round.


Managers and C-levels surely have appetite for this. In the end it just degrades mostly to trial-and-error, which I'm sure is no problem because the people have to do the real work get sick of it and get replaced (sometimes voluntarily, sometimes with force), or play the game nice out of fear losing their job (thus feeding the beast).

It's not very surprising since 'products' nowadays are more like 'services' instead, and offer some kind of encapsulation: people are not interested much in how something ticks behind the scenes, and so it can drive down the behind-the-scene quality. They also are conditioned to accept lousy excuses (including none) for outages and breakdowns, because they got sold 'magic', and boy that is magical..

You'll might have to work in obscurity to keep up operations quality, which is in turn a driver for your own demise. Game over.

I'd like to suggest to you the following reading material:

- Bullshit Jobs (David Graeber, 2018)

- The Dilbert Principle (Scott Adams, 2000)

- Future Shock (Alvin Toffler, 1970)


Your comment reminds me of the "Lemmings walking off a cliff" animated GIF that was popular once animated GIFs became a thing on the web.


You'll get blank stares and resistance whenever you throw a bunch of jargon at people that they don't understand, though. It makes people feel insecure.

You can't expect new programmers who sound like primarily front-end devs to learn all the hot new stuff and all the old stuff, too.

If you want someone with that kind of knowledge, you need to hire a senior dev with 20 years of experience.


Easiest thing to do to solve this would be to have something like artifactory or even squid in front of npm and use exact version numbers. Front end devs are going to use NPM. Solve the other problems separately like you would with any other build system.


Sure. And then when something unexpected happens, you have two problems.


You mean clearing a cache? I don't understand. Using a proxy to protect yourself from missing dependencies/down servers is a bad thing now?


s/younger/less experiened/


Not alien to me, likely younger.


Don't turn it into an ageist thing, unless you are particularly old, there will be plenty of older developers that also don't understand this.

It largely comes from understanding _why_ reproducibility is a good thing, and there are _lots_ of open source maintainers that understand this that are likely "younger than you". The vast majority of developers though focus on other things, that SQL injection attacks are still a thing indicate that we as a community have a _long_ way to go on the security front.


It is way bigger then security, and hits upon quality of life and reliability.

Without a deterministic build I don't know what I check in actually works across environment, or if the deployment artifact works.


checksums are a thing, as well as package mirrors or outright holding onto the file and checking in a binary.

There's loads of ways to know what you are using without needing to strip mtime's from zip files.


Checksums, solve a different problem. How does a developer know his code which works on his laptop will work on production. One of, but not the only, conditions for that are that his development artifact builds with the same dependencies as it will build in production. Their are a lot of possible causes for Dev/Prod to differ wildly. One of them are poor build practices.


Without a deterministic build I don't know what I check in actually works across environment, or if the deployment artifact works.

Actually that's mostly a given in JS land.

Many bright minds worked hard for it to be this level of idiot-proof.


I recently discovered that in node 10.10 (IIRC), you need oracledb < 4.0.0 whereas on other versions of node like node 10.16, you can use oracledb 4.0.1 (again, IIRC). I discovered this via the message when it blew up on application start, since it's not encoded in the package versions.

I may be a better idiot.


Oh, that.

Weird that it didn't complain during building or installation.

Anyway I had one project where I couldn't use `async` `await`, because apparently this feature requires Babel 7, which in turn requires a version of node fresh enough to support generators.

This tends to happen, but it's rare to discover such an issue only after starting the application.


> Without a deterministic build I don't know what I check in actually works across environment, or if the deployment artifact works.

I use to have this concern, but it's really not an issue with modern package managers. Even without a checksum in a lock file (e.g. gem bundler), I haven't actually had a deployment break because of dependencies changes. The biggest issue is being blocked for a couple hours because a package repo went down.

Most of this is solved by using a self-hosted or 3rd party proxy package manager.


The maven/gradle ecosystem actually does use signatures. That's one of the things they got right. Maven central has no unsigned artifacts. Also, the signatures are actually checked on download. Other maven servers may be a bit more sloppy of course. In any case, if you are running that on a production system, you are doing it wrong. You run that as part of CI. The output of CI is containers which you start on a production machine.

The point about containers is that they should be immutable. Running chef/puppet in kubernetes is not a thing (I hope, probably somebody went there). Updating a container means replacing it with a new one, that hopefully you or somebody else built in a responsible way.

That last part is indeed a problem that we have moved instead of solved. 20 year ago, people were just sticking whatever they downloaded on hardware they bought at the store and banged on it until it worked (very literally sometimes). So, I guess this is progress but indeed hardly ideal. I remember using puppet. Can't say that that is with any level of fondness. I vastly prefer Dockerfile and having CI systems produce containers from those.

Installing things on a filesystem is no longer that common at deploy time unless that filesystem is that of the container you are building using a Dockerfile or you are self hosting kubernetes (aka. reinventing a lot of wheels at great cost). Puppet/chef etc. are still of use if you are doing that but otherwise it has a limited role in IAAS type architectures. The closest thing is perhaps pre-baking AMI images using packer and some tool like ansible, which is nice if you want to avoid having a lot of startup overhead.

Hadoop is complicated, which is why companies exist that host that for you or will help you hosting it on premise in a responsible way. If you want to DYI, you indeed have to do a lot of things and do them properly. Kubernetes has moved that space forward in recent years; so it's easier but this is not for everyone. If on the other hand you are messing with puppet to get this stuff done, maybe reflect on the wisdom of not standing on the shoulders of giants rather than blaming the internet for your self inflicted pain.


Discussed at the time (443 comments): https://news.ycombinator.com/item?id=9419188

And again in 2018 (426 comments): https://news.ycombinator.com/item?id=17083436


But unfortunately, it's relevant to this day.


More than ever.


Terrible article IMO. Hadoop is an awful mess, but it has nothing to do with Docker, which is simplistic in comparison.

"Ever tried to security update a container?"

Yes, I have! In fact you can maintain patch compliance in a container pretty much the same way you'd maintain a VM or bare metal Linux installation!


You missed the bigger point.

Just look at the popular images on Docker hub.

A lot of them involve messy build steps, including downloading binaries or source tarballs without verification.

It's often hard to know which dependencies a Docker image has, and therefore hard to track vulnerabilities and redeploy fixed images.

A lot of docker containers end up either running for a long time, or get rebuilt and redeployed often, but have pinned versions of base images or dependencies which still leaves vulnerabilities in place.

This is compensated a bit by the fact that Docker does provide somewhat decent isolation, and most containers are run in a cloud environment, behind load balancers and with decent security constraints. But there are many pitfalls in the whole process.

Containers provide a lot of advantages, and are certainly where things will continue to go.

But we need to figure out the tooling and ecosystem story to build, verify and update container deployments securely.


Just build your own. It's not that hard.

Before containers we would script the install process using Ansible. With containers we script the install process using a Dockerfile. If you know how to do one, you know how to do the other. Just don't be lazy and use unofficial images created by people you don't know or trust. Just like you wouldn't pull any random role from ansible-galaxy. Or pull a random guy off the street and ask them to provision your infrastructure for you.

Then all you have to do is just rebuild your images every week and you're set. All we had to do for this was have our Jenkins build pipeline run weekly & have it pass `--pull --no-cache` to `docker build`.


You can even avoid dockerfiles by using Ansible: https://docs.ansible.com/ansible/latest/scenario_guides/guid...


We tried that about a year or two ago with `ansible-docker`. At the time it didn't leverage the docker cache so it was quite slow. Looks like that got replaced with `ansible-bender` which according to the readme does use the docker cache!

Thanks for mentioning this, will definitely have to give this another go.


"But we need to figure out the tooling and ecosystem story to build, verify and update container deployments securely."

Already done: Solaris zones. Available in the SmartOS distribution near you. Combine with OS packaging, imgadm and vmadm commands for maximum impact.


I've spend 1 minute skimming some of the docs, but it's unclear whether these tools can do a better job than the docker-like tooling. The vmadm "build an image from scratch with a json file" doesn't help in building Hadoop any more than the oldest versions of docker did using a minimal image and a Dockerfile.

How does provide visibility into every dependency inside an image? Especially when using 3rd-party-maintained images that consist of significant hours of work per image to get the image to work.

(BTW., I built "live CD images" for use in physical and virtual machines since ~2001, using tools such as mklivecd, livecd-tools etc.).


It doesn't. He's a troll that claims solaris is the solution to every problem that he doesn't understand.


I suspect there's a similar story around LXC available.


> A lot of them involve messy build steps, including downloading binaries or source tarballs without verification

Do they though?

Whenever I write Dockerfiles that depend on external downloads, I always check the hash matches one baked into the Dockerfile itself, and I've always seen others doing the same.


I have seen this many times, generally the high profile containers do that, and these issues are more common among my peers. This requires self discipline and containers were pushed as a panacea for environments where the discipline is lacking.


What is the point of storing the hash of an external file instead of checking the file into local version control? You're making it a part of your sources either way.


People who want to run docker don't want to run artifactory.

I'm not saying it's a good thing, but that's what I observed.


Do you have any examples of high-profile images that do this?

I really just haven't seen this with the ones I use.


Sorry, I see that my response looks ambiguous.

I meant the opposite. The high profile containers generally do the right thing, but containers created by peers rarely do it.

There's no mechanism to make things reproducible you need to implement it yourself, and from my experience most people (not the ones that release something publicly, but ones that work for a companies that use it) don't do it.


The article focuses on Hadoop, but the generalization to containers is spot on.

Containers have become the first stop method to hide overly complicated build processes. When you've noticed your wiring has become a complete mess of knots, just put it in a box so no one will trip.

It seems to me that containers are useful for deploying final setups, but should be a no go for packaging tools.


>When you've noticed your wiring has become a complete mess of knots, just put it in a box so no one will trip.

On the other side of the coin, putting stuff in boxes is the essence of architecture. CPU opcodes? Boxes. Compiling readable code into bytecode? Boxes. Objects? Boxes. Functions? Boxes.

Sure, it can be abused. There might be spaghetti in those boxes. The spaghetti should be chopped into digestible chunks (smaller boxes!). But putting it in a box makes that an implementation detail. Putting stuff in boxes is still progress.


All problems in programming can be solved by adding another layer of abstraction (ie a box) - except too many layers of abstraction.

Good architecture is not about putting things into boxes, but about figuring out what boxes you need and which you do not. Sometimes a box is good, sometimes a box is bad. It isn't the fault of the box that it does't contain the right thing.


I don't think it's fair to blame containers for that. Just because someone programs a shitty product in Visual Studio, does not mean it's a problem with Visual Studio.

What containers does beautifully is dependency containment. I make something and put it inside a container, with the exact dependency it needs and it'll work alongside other containers with a program that might have a different incompatible dependency.

What we're seeing is not so much that developers suddenly decides to throw best practice into the wind. They just do not have to put the same amount of effort into fixing the broken process exactly because we have dependency containment. I believe we would see the same mess without containers, and the process change that would fix one would fix the other.


Containers are great, for example, to internally deploy some internet-facing web component your company wrote but they are not that great as a mean of generic software distribution. But things get complicated when to build the first type of containers you usually need software from the second type. In my case I just mirror good/proven docker images in my internal repo(s) and rebuild images from scratch in the most reproducible manner I can find when the software is badly packaged.


I would see it more as to dividing your wiring into segments with breakers in between so that one rogue device doesn't bring down the whole network.

Seems a good thing, to me at least.


Containers are fantastic for packaging tools. What problems do you see with it?


Agreed, up to a point:

Ever tried to security-update a vendor-provided container which you should never touch because of the vendor's support and warranty conditions? And even if allowed to touch it, the mess that it usually is: app running as root in the container, stuff chmodded a+rwx -R, weird base distro, build script is just a wget, cp -R or "this binary magically appears from somewhere". With a helping of " only works if the directory structure and environment look like the devs machine".

Containers are the ultimate expression of "it compiles, let's ship it" laziness


But that's vendors, not containers.

You can say exactly the same about vendors on a VM - applications running as root, bad passwords set, reboots impossible, patching impossible, backups... Or a virtual appliance. Or a third party application you installed and cannot upgrade due to a weird dependency on libc, or java, or a specific windows versions.

Vendors like that are always a mess, no matter the technology. Containers can turn into an equal mess, yes, but that's up to the build chains and ecosystem built around the containers.


Agreed. Most vendors supply an unholy mess of weird setup.exe, shellscript installers, omnibus packets or appliances as well, containers are not alone here.

Edit: To clarify, containers and VMs are somewhat special in that it is very easy to do the wrong thing, and that you get a lot of rope to hang yourself with. The application developer by. default takes responsibility for the whole OS stack (maybe except the kernel with containers) but usually doesn't handle it very well. Whereas with packages or even setup.exe, the default case in most IDEs and tools is to package very little, and only upon request (using omnibus e.g.) to include the whole shebang

edit2: the definition of 'vendor' of course includes a lot of OSS projects who are just as bad as most commercial vendors in this regard, as exemplified by all the aforementioned Apache Foundation Java Crapware


> Hadoop is an awful mess, but it has nothing to do with Docker, which is simplistic in comparison.

The fact that they can get away with a build system like this is very much due to docker (and curl | sudo bash) allow people to not feel the pain this mess causes, at least not right away.

> Yes, I have! In fact you can maintain patch compliance in a container pretty much the same way you'd maintain a VM or bare metal Linux installation!

But then you lose the declarative/immutable nature of docker, no? Except if you mean every time there's a security update, you rebuild your containers?


It's pretty hard to do truly declarative/immutable things with docker because the typical dockerfile starts with "apt-get update && apt-get install ...". For that use case I think nix is much better.


Linters complain when you don't pin the version, though


That's what I tried doing. The result was that the docker container didn't build anymore because at least one of the packages we used had a a patch release every other day and apt only serves the most up to date version.


Except if you mean every time there's a security update, you rebuild your containers?

That's exactly what you should be doing, at least if you're relatively small scale. Google probably takes a different approach but most of us aren't working at that scale.


First: No one installs Hadoop from scratch, and Hadoop isn't built with Docker. Companies use Ambari or other distros like EMR.

Second: Yes, generally you can use a build system like Jenkins, and a registry like Artifactory to automate the process. The docker image is updated, and then you can push it out with your orchestration in whatever method you choose. It's not an obscure or difficult thing to manage..


>Unfortunately, the build process for these packages is currently of a disastrous quality, and should only be attempted within disposable virtual machines, as it requires root permissions and will install non-packaged software.

https://wiki.debian.org/Hadoop

If Debian, a distro on the lower side of dramatic, calls it disastrous, it's pretty bad.


Debian is (mostly) low on the drama, but high on the expected package quality. It's not surprising they'd have problems here.


I couldn't disagree more, by looking at how badly Debian dares to diverge on basic packages such as Apache, nginx and exim it's clear to me that they don't care at all about quality.


Divergence from the upstream isn't a marker for or against quality, as Debian chooses to define it.


I think the article is right on and makes a good point. The title does imply that containers are somehow bad for system administration. But the point is not that a container with a nicely installed, maintained, sensible image and applications pre-loaded on it would be a problem for system administration. Nor would it be anything but trivial to update patches in this situation.

The real point the author is making is that many times containers are misused to paper over problems like lack of reproducibility, dependency hell, etc. They aren't reproducible, certainly not by you. If they were you'd not need the container in the first place, or you could build your own easily. So for this kind of misuse, the analogy to Tucows Windows 95 shareware is pretty close.

Hadoop and Docker are just the examples to make the point. And they are good enough ones, because I could understand exactly what the author meant. There are certainly hundreds of other examples.


I suspect Hadoop was just used as an example. For another, I would suggest Concourse (build/CI tool). When I tried it, it was docker images for the default install, vague instructions, and a dev forum which seemed rather hostile to anyone asking for support to build or configure outside of containers.

Might have changed since then, but a big turn off.


I was confused when I read the article. I kept thinking it was actually a problem with hadoop and not docker, since docker is nothing but an abstraction on top of the Linux kernel.

I remember trying to produce a viable alpine based docker image for a flask backend service we had thar produced pdf using latex. Dependencies were really hard to pin down, had to resort to open issues on github, but utimately all packages were on the distribution, so it is not a problem with the tech itself, but the archaic way Java lets people to produce applications


Standard Makefiles are really under appreciated today. They are simple and can be used with most any language, version controlled and have been around for decades.


They don't solve any dependency management problems.


Most ports like systems actually use makefiles to solve cross-package dependency management. (Whether that is good approach and whether writing what is essentially package manager as hairy ball of makefiles is good idea are another questions)


Agreed (although I so wish they weren't utterly dependant on tabs!)

I even sometimes use makefiles in conjunction with Docker, for example to automatically add the correct tags to a build - it's much easier to type `make build` that to remember a massive invocation for `docker build`.


Makefiles are problematic on Windows, though. There's no Make out of the box, there's a bunch of incompatible makes that are subtly different in different ways, and then most Makefiles tend to assume sh and POSIX at some point.


Makefiles are too low level to be convenient for anything other than simple projects on one platform.


If the alternative you are hinting at is CMake, I can't really agree with you.

I tried it recently and was shocked at how awful the syntax and usability was.

It's begging for a tool to auto-detect your dependencies from source. I was trying to port an existing project to it and I gave up when I realised that I'd have to manually create about 30 CMakeList.txt files or whatever they call them.

Meanwhile during the same weekend, autoconf allowed me to build gcc from source with three commands. I know I know, I'm not supposed to like autoconf, and the orthodoxy is that it is garbage ...

It seems every generation ignores the previous generation's tools and invents worse ones to replace them.

If this continues we'll soon go back to banging rocks and sticks together.


I am definitely not hinting at CMake!

Better options are Bazel, Buck and Please. Please is the most light-weight option.


Take a look at Ninja and Bazel. I ahree with you that CMake is worse than autoconf but I would argue that the tools which came after CMake are better than CMake and autoconf. So while there certainly are regressions not every new generation is worse than the previous.


Ninja is an optimized back-end for other build-systems (including but not limited to CMake). It's not really intended to be used directly.


But it's not a product you can sell or artificially inflate its monetary value so out the window it goes.


ha +1 for the build tool mess. its like everyone goes "duuude this stuff is so complex lets build something on top to abstract it all away" and you end up with another level of complex stuff that builds the stuff that builds the that wraps the stuff that you originally wanted to use...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: