Hacker News new | past | comments | ask | show | jobs | submit login
Confession Of A C/C++ Programmer (2017) (ocallahan.org)
62 points by yagizdegirmenci on Jan 25, 2021 | hide | past | favorite | 99 comments



>> I cannot consistently write safe C/C++ code.

I think it's going to depend on your attack surface. Could I sit down and write a web-facing server from scratch and have a reasonable expectation of safety? Probably not. Can I write a piece of firmware for an embedded device with no internet access and a very limited serial protocol, and have a reasonable expectation of that being 'safe'? That seems more likely, particularly if I follow good practice and use available analysis tools where possible.

I think the biggest problem with these languages is that when something goes wrong (as it so often does), the keys to the kingdom are up for grabs as the whole of everything can suddenly be manipulated, to read out arbitrary memory (e.g. heartbleed)


It almost goes both ways. Yes, being on an embedded device with no Internet access reduces your external attack surface. But it also limits your ability to use lots of newer language features designed to improve safety, because those features often cost CPU cycles or memory.

My (completely uninformed) guess is that most of any net benefit from being on an embedded system comes from the simple fact that "embedded" implies "constrained", which implies "less code," which, in turn, implies "fewer opportunities to create bugs in the first place."


I don’t think that you are limited regarding newer language features when you develop for embedded devices or limited in the use of more safety focused languages.

E.g. Ada, despite being in the older age for computer languages was specifically designed with embedded programming and safety features in mind and does this quite good.

It also supports OOP if you want that sort of thing and more crucially has pretty good concurrency support build into the language while still being reasonably safe. With SPARK/Ada you even get a language variant that can be formally proven. To name some more modern use cases.


But this wasn't talking about using other languages with more of a safety focus. It was talking about using newer features of C++ that are meant to support safer programming. Most of these come in the form of abstractions with non-zero cost.


OK, but...

I see three options. 1) Use the non-zero-cost abstraction to do X safely. 2) Write the equivalent abstraction to do X safely, which will cost you the time to write and will probably be buggier than C++'s version. 3) Don't do X.

The fourth option - do X but in an unsafe way - should not be on the table for a professional developer.

Now, C++'s philosophy is that you can't write your own thing to do X more efficiently than C++'s builtin version. They may not perfectly achieve that in all cases, but they come fairly close. That plus bugs means that I won't reach for option 2 unless I have really good reason to think I can do better.

That leaves using the builtins to do X safely, or not doing X. At least that's how it looks to me...


It certainly plays into it - fewer moving parts means less opportunity for things going wrong.


>Could I sit down and write a web-facing server from scratch and have a reasonable expectation of safety? Probably not.

This is very suprising to me that people think this. I am not a fantastic programmer, but I don't see writing safe code diffucly at all, you just have to understand the basics how attacks work, and follow some basic rules when writing code.

In order for a piece of software to be exploitable, you need to have an input channel that is not isolated from the functionality- something like taking the parameters in the url and passing it directly to a shell is a massive failure in isolation.

As long as you do proper input sanitation, where the input must have certain format, and certain length for each part, and certain formatting for each part, the attack surface goes down to an absolute minimum. For example, for all the C/C++ code that I have seen that deals with web, only once I remember seeing a check on characters in the passed data being in the set of [0x10, 0x13, 0x20-0x7e], which should be the first check performed on any HTTP data passed to it, even before you start invalidating HTTP characters.

As far as memory safety goes, never rely on null terminated strings, allways copy the exact amount of data to a statically defined max, and make the buffer that holds data have the length max+1 with an extra byte that is allways set to null for every copy into the buffer. For malloc/free, design your code so that malloc and free are in the same function at the beggining and end, and allways run memcheck and valgrind before deploying to production.

Between all the other mechanisms in place, like non executable stack/heap, stack canaries, ASLR, and the tools like memcheck and valgrind, the possibility of you making an exploitable software is as low as coding it in a "proven safe" language (if such a thing exists) and messing up some parser input that can lead to unexpected behavior. And of course, there is still the possibility of using a library that has an exploit, or coding in an exploit in higher level logic, but that is a danger for any language.

I would bet that if taking an exploitation class for CS degrees was a hard requirement, where people understood the concept of stack, heap, and how the different types of exploits work, the outlook on this would change. I also wonder if teaching people C++ instead of C, and relying on built in memory management mechanisms to build software instead of making them manually write all the allocations by hand has an effect of a less understanding on what goes on under the hood.


> In order for a piece of software to be exploitable, you need to have an input channel that is not isolated from the functionality

And prove that your input validation is correct for every possible value, and that under stress the system doesn't start doing something weird, and there are no possible integer overflows that might cause something funny to happen and ...

> For example, for all the C/C++ code that I have seen that deals with web, only once I remember seeing a check on characters in the passed data being in the set of [0x10, 0x13, 0x20-0x7e], which should be the first check performed on any HTTP data passed to it

Unless it's unicode of course, which it really should be these days if we're talking about the web, and when really you want a unicode parsing library to validate, and then you're reliant on that having no exploits, and anything of any significant complexity usually turns out to have some somewhere.

> "Between all the other mechanisms in place, like non executable stack/heap, stack canaries, ASLR, and the tools like memcheck and valgrind, the possibility of you making an exploitable software is as low as coding it in a "proven safe" language"

This is provably false though, and that's part of the point of the article, even well looked-after software written by experienced people with an eye on security suffers from problems, and with unsafe languages there are classes of attack that are just not possible with safer languages. These keep on happening regardless of the apparent skill level of the practitioner. The author has been doing it for 25 years and doesn't think they could do it reliably. I was a C programmer for 15 and I agree, past a certain level of complexity it looks like there just are going to be errors of this sort somewhere in your code.

Sure, programs written in 'safe' languages are not bug-free or exploit-free, but for public-facing, networked code they do seem to be better as they eliminate a huge number of damaging attack types straight off.


>And prove that your input validation is correct for every possible value

You don't need to check against every single value, you just need to check formatting, length, and data ranges. For your integer overflow example, you made 2 mistakes already, first is that anything that is expected to be positive should be unsigned integer and secondly, when you parse the string, you fail to check against max integer value.

>Unless it's unicode of course

Then validate unicode, lol. All this stuff has defined rules. Furthermore you design your code not to support unicode, and then someone inputs a name in non ASCII, the request will be rejected, which is a much better issue to have security wise rather than blindly accepting any and all input without validation.

>This is provably false though

Based on what, an opinion of some programmers? Thats not really "proving" anything. If you want to make a statement that most programmers don't understand things like why you should never rely on strlen, Id probably agree.

But Im not really concerned with distribution of skill within the modern programmers, I against the assumption that "you should not write stuff in C because its very hard to write safe C code".


> You don't need to check against every single value

I said prove for every value, not check every value.

> first is that anything that is expected to be positive should be unsigned integer and secondly, when you parse the string, you fail to check against max integer value.

I wasn't necessarily confining myself to input validation when talking about integer overflow, there are other places and ways it can be caused, and other vulnerabilities in systems than pure invalid input. Maybe I pass perfectly valid input that causes a small bug somewhere deep inside your system to behave weirdly. The point is it's not as simple as "I validate my input and everything's fine".

> Then validate unicode, lol.

That's the point I was making there, you can't just validate ascii these days, and validating unicode's not as trivial as you're making out, plus you now have a unicode parser which may contain any of these problems too.

> Furthermore you design your code not to support unicode

I think your ideas are about two decades out of date here. There's more to the world than ascii and there has been for a long time.

> Based on what, an opinion of some programmers?

Based on where exploits happen and how bad they are.

> But Im not really concerned with distribution of skill within the modern programmers, Im more interested in the assumption that it is hard to write safe C code in comparison to other languages,

Do you not see the inherent contradiction in the way you've stated that? "It doesn't matter if most people can't do it, that doesn't make it difficult"

Plus, you've told us yourself, there are a ton of extra tools and techniques needed to even attempt to make 'safe' code in C compared to other languages where these classes of errors are just impossible by design. Does this not say "harder to make safe" to you?

> and learning that is beyond reach of most people.

The point in the article is that it's quite likely beyond basically everyone, and this is the conclusion a C programmer has come to after 25 years of bitter experience.


Im really starting to hate the internet more and more every day.

You say validating unicode is not trivial, and mention using a parser which can contain bugs.

Or, you can just look at the specification like this one:

https://docs.oracle.com/cd/E18283_01/server.112/e10729/appun...

And easily write your own validator to check against valid byte ranges.

Ill just leave this conversation with "Agree to Disagree".


The point is it's not as trivial as you say, and there's more to safety than pure input validation.

The author's point was that even people who think they're doing it right don't catch everything. If you think you do then I wish you the best of luck, but I also wouldn't want to work on safety-critical systems with you.


> As far as memory safety goes, never rely on null terminated strings, allways copy the exact amount of data to a statically defined max, and make the buffer that holds data have the length max+1 with an extra byte that is allways set to null for every copy into the buffer. For malloc/free, design your code so that malloc and free are in the same function at the beggining and end, and allways run memcheck and valgrind before deploying to production.

Aren’t you just proving the author’s point here? Even your fairly simple heuristic that works in most cases is easy to screw up due to a typo / brain fart, and it’s quite likely that even experienced developers will write code that memcheck/valgrind complains about (which is why the tools exist!)

So writing memory-safe code in C/C++ is brittle and not easy to do consistently without additional tools and checks. The author’s point wasn’t that it’s impossible or conceptually deep, but that it’s tricky and painstaking work which is easy to mess up.


>Even your fairly simple heuristic that works in most cases is easy to screw up due to a typo / brain fart,

This argument can be made for "safer" languages just as easily. Most of the web exploits that exist, like request smuggling, parser abuse, or generic exploit involving a sequence of API calls, are all due to brain farts. At a certain level you have to expect some competency.

My main argument is that the competency for writing C code isn't that much higher than for other languages.


> My main argument is that the competency for writing C code isn't that much higher than for other languages.

I think that goes against observed reality, and the consequences of these errors are worse in unsafe languages. Look at heartbleed - a bounds-checked language would not have allowed that problem to be anything more than an exception, instead process memory was open for reading.


Easy counterexample: most of the computers worldwide run Linux kernel, which is written in C, and kernel exploits, especially those that are accessible from the web, are significantly rarer compared to the higher level exploits in CRUD APIs or LAMP stacks that work on the http protocol level and the authentication state machines.

This argument doesn't make sense. Yes, programmers make mistakes. Yes, they shouldn't make those mistakes. My point is that its not difficult to write safe C code if you follow some basic rules.


> Easy counterexample:

The linux kernel is not really a counterexample here, and neither would I call the decades of security hardening that's gone into it "easy"...

> Yes, programmers make mistakes.

And in some languages those mistakes have far worse consequences than others, and more types of mistakes are possible.

> My point is that its not difficult to write safe C code if you follow some basic rules.

Except it is more difficult, for a start because you have more rules and tools needed to make it safe, if that's even possible. As you yourself have said.


One nice thing about newer C++ is that you can write safer(in some sense) functions. To a point. constexpr/consteval functions that are tested at compile time do not have UB and do not interact with global state. This is enforced by the compiler. So building up from this, you get safer systems. One cannot say they are safe though, as it is complicated.

This comes back to is the whole system safe, what do I trust, and how have I mitigated my mistakes(there are always mistakes). No purported safe system will stop me from making logic errors in my conditionals. They will stop an access outside a valid objects or ranges. But C++ also has tools to mitigate this, is a for loop good here or should it be a function that works on a range and is tested properly. Can I enforce my constraints in a type so that the compiler does the work for me.


> One nice thing about newer C++ is that you can write safer(in some sense) functions.

But in practice you rarely find these in the codebase you inherit.


That goes for any system, legacy code is good and bad. Often it's battle hardened, but looks super complicated because of the post release organic growth with fixes. Green projects are able to be engineered nicely(if you have the time...) but are missing real world exposure to find all our incorrect assumptions and errors.


Programming today is fundamentally designed to function rather than be correct. correctness is the second stage that any programmer experiences when you rewrite a naive implementation and get a performance boost or behaviour that is less likely to fail. but really all that has happend is any first completely new implementation has unknown unknowns and rewriting it improves your code and turns them into known unknowns. but there is still tons of stuff that I dont understand that will fail for unknown reasons. There's just hopefully less of them.


> Programming today is fundamentally designed to function rather than be correct...

As long as such code is coupled with reasonably complete tests for the expected behavior (think of fuzzing as one of such tests), it's a quite valid and "safe" approach.

Of course, it's possible to write "unsafe" code, yet pass most of business-related tests. But this is equally possible with any programming language.

Unsafe assumptions will lead to unsafe implementation.


Everyone makes mistakes, but trying to write correct code is important. You have to be conscious of the consequences of what you write. You shouldn't rely on non-fixed-width types having specific lengths or on signed integer overflow, unless your code guarantees that things will work (preprocessor/static_assert checks against standard/compiler-specific properties). The worst offenders I see often is violating aliasing rules, especially in ways that could cause alignment issues.

In C, function isn't an object, you can't convert a function pointer as a void pointer and back, and use it. Yet I've seen that done many times, and it's probably safe on things that act as if they have a von Neumann architecture.

If I'm relying on implementation-dependent functionality, I try to make it so that whatever it is will fail to compile instead of having incorrect behaviour.


Better than you being aware, make sure your build system is aware - for example, compile your code with the maximum level of warnings and errors turned on, and don't commit it until you have resolved them all.


+1. Yet this should be done from the very start of the project.

Too often projects start in a permissive way, and then get released. At that point switching to strict way may become an unsurmountable task practically or politically.


Definitely, `-Wall -Wextra -pedantic` are always there, and only turning off individual ones if there's a good reason.


> In C, function isn't an object, you can't convert a function pointer as a void pointer and back, and use it.

It is not defined by the C standard, but POSIX requires it. So any environment (including the compiler) claiming POSIX conformance must support it.

Not sure how it works on Itanium though as, IIRC, function pointers there are twice as wide as normal pointers.


Plenty of platforms don't care about claiming POSIX conformance.


> I've heard maybe Daniel J. Bernstein can

DJB simply redefines his unsafe code as safe[0] and asserts that you're Using It Wrong:

--

In May 2005, Georgi Guninski published "64 bit qmail fun", three vulnerabilities in qmail (CVE-2005-1513, CVE-2005-1514, CVE-2005-1515): http://www.guninski.com/where_do_you_want_billg_to_go_today_...

Surprisingly, we re-discovered these vulnerabilities during a recent qmail audit; they have never been fixed because, as stated by qmail's author Daniel J. Bernstein (in https://cr.yp.to/qmail/guarantee.html):

> This claim is denied. Nobody gives gigabytes of memory to each qmail-smtpd process, so there is no problem with qmail's assumption that allocated array lengths fit comfortably into 32 bits.

Indeed, the memory consumption of each qmail-smtpd process is severely limited by default (by qmail-smtpd's startup script); for example, on Debian 10 (the latest stable release), it is limited to roughly 7MB.

Unfortunately, we discovered that these vulnerabilities also affect qmail-local, which is reachable remotely and is not memory-limited by default (we investigated many qmail packages, and all of them limit qmail-smtpd's memory, but none of them limits qmail-local's memory).

As a proof of concept, we developed a reliable, local and remote exploit against Debian's qmail package in its default configuration. This proof of concept requires 4GB of disk space and 8GB of memory, and allows an attacker to execute arbitrary shell commands as any user, except root (and a few system users who do not own their home directory). We will publish our proof-of-concept exploit in the near future.

About our new discovery, Daniel J. Bernstein issues the following statement:

> https://cr.yp.to/qmail/guarantee.html has for many years mentioned qmail's assumption that allocated array lengths fit comfortably into 32 bits. I run each qmail service under softlimit -m12345678, and I recommend the same for other installations.

--

[0] https://www.openwall.com/lists/oss-security/2020/05/19/8

PS: HN's markup keeps sucking and making comments unreadable


"Nobody gives gigabytes of memory to each qmail-smtpd process"?

-----

QUESTION: "I read in a newspaper that in l981 you said '640K of memory should be enough for anybody.' What did you mean when you said this?"

ANSWER: "I've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time."

https://www.wired.com/1997/01/did-gates-really-say-640k-is-e...

-----

If I were a cynical programmer I might think DJB just doesn't want to hand over a $500 check.


In other words the instructions on how to run qmail without security holes were clearly displayed in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.


Layman warning: I never really understood (as an outsider who does not code) why code can be unsafe. Is code more like art (painting) than math (writing an equation which balances itself)?


Ever seen a 3D printer in operation?

3D printers use a "language" called gcode. It's not really programming, it's a series of commands that tell the 3D printer nozzle to move to a certain location at a certain speed while extruding at a certain rate. There are a lot of ways you can mess that up, you can tell the nozzle to go as low as it can and just start extruding, giving you a big blob on the bottom of your 3D printer. You can tell it to move to a position that it physically can't, outside of the bounding box it can print in. Most 3D printers don't have endstops to prevent you from going too high on an axis, so they'll try to do that and tear themselves apart. You can try to extrude while your extruder isn't up to heat and grind down your filament. You can physically jam your extruder into whatever it is you're printing. There are all kinds of things that 3D printers are physically capable of doing that are unsafe.

Computers are just a machine like a 3D printer. You're "physically" moving bytes of data around (that's where most of the heat comes from), doing operations on them, etc. Nowadays you generally can't get them to destroy themselves but in the earlier history you absolutely could tell the machines to tear themselves apart in the same way you can tell a 3D printer to tear itself apart.

Computers are just machines for moving bytes around, and it's really hard to make a machine that you can only do safe stuff with.


> Computers are just a machine like a 3D printer. You're "physically" moving bytes of data around (that's where most of the heat comes from), doing operations on them, etc. Nowadays you generally can't get them to destroy themselves but in the earlier history you absolutely could tell the machines to tear themselves apart in the same way you can tell a 3D printer to tear itself apart.

This is wrong. It's not what 'unsafe' means for C or C++. You could have 100% safe hardware for your code to run on, and your code could very well still be unsafe. The phrase you're looking for is Undefined Behavior.


I think that "you can also move a byte into the wrong place and have that change your program" was implied implicitly, but sure.


The code can be unsafe because it is physically impossible to test for every input in a computer. This is where various engineering designs come in which reduce the area of testing based on some theories.

Also, even in math, there are enough mistakes in publications (not just typos, but reasoning errors) which hopefully do not affect the eventual results in any fundamental way. The equivalent of safe code in computer science would be equivalent of completely formal proofs in mathematics (like in Coq and similar languages), but probably much more difficult due to existence of temporal conditions.


Err there are other ways to prove (memory) safety than exhaustive testing, such as better type systems and static analysis (Rust) or better run time checks (any garbage collected language)


Memory safety is not the only type of safety though. There are race conditions for example.


There is a fundamental difference between math and code. In a typical modern day computer instructions, and the data to be manipulated are put in the same place [1]. This is a critical feature that makes things like downloading a program and running it possible. But for programmers to keep their code "safe" they must enforce artificial boundaries between the instructions (code) and the data. Hackers are experts at crossing those boundaries and tricking computers to treat data as code when they shouldn't.

Mathematicians have the good sense to keep their data and instructions separate. [2]

[1] https://en.wikipedia.org/wiki/Stored-program_computer [2] https://en.wikipedia.org/wiki/Field_(mathematics)


Compare it with tax law: lawyers, legislators, and many other people dedicate enormous amounts of time to create tax laws, and still tax evaders (hackers) find loopholes to exploit. Those loopholes are the law's bugs.

One could argue that those loopholes are left intentionally (back doors), but even if all actors were honest, bugs would still happen from time to time.

Computer code has even more bugs because it is produced massively and quickly, without the bureaucracy of tax code. Anyone can create an awesome application/library/framework/etc., and share it freely with the world. People end up using these projects as stepping stones for their own projects, creating a complex layered cake where bugs can hide for years.


I don't know about the art vs math question but, taking the math example, your code can be unsafe in the same way your maths can be wrong (i.e. maybe you start with a bad premise or your derivation is invalid). More generally I'd describe both of these situations as 'unsound' and actually they manifest themselves in the same way in both disciplines (an oversight, incorrect model, complexity, etc).

You might think that if you do maths on the computer, maybe it can help you keep things valid as you execute your derivations, and something similar can be done for coding. This is true, in maths/logic they have theorem provers and in coding we have static typing. Again they literally manifest themselves in the same way in both disciplines due to the Curry-Howard Correspondence[1].

You can also argue that in conjunction with static typing there are also linters, etc, but I anchor specifically to static typing in this example because of how directly it relates to your math comparison.

[1]: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...

EDIT: spelling


Ever seen one of those prank videos where someone is in the shower rinsing shampoo off their head, and the prankster leans over the shower wall and squirts a bit more shampoo onto their head, and the prankee gets more confused and annoyed when they keep rinsing "endless" amounts of shampoo that should be done by now?

Buffer overflow and "unsafe" code is like that - the showering person isn't painting or equating, they're expecting an end condition "when the water coming off my head stops having soapy lather and runs clear" which works every time, but is not a "safe" pattern - it assumes no malicious intervention. Someone else can change the surrounding circumstances so that the end condition doesn't happen when it should, and "cause" the rinse routine to keep running for longer and longer.

Buffer overflow attacks are like this, they're expecting to read some data and stop when they get to an end condition; when badly designed an attacker can change something to delay the end condition and cause more data to be read. Inside a computer there are no such things as "separate applications" or "security boundaries" or "program data" or "OS instructions", except that the patterns of numbers are supposed to be interpreted as those things. If a program can write "program data" but cannot give the OS instructions, maybe it can drop some more shampoo on the OS's head and cause the OS to keep reading more and more "OS instructions" only it's now reading past the normally expected end and reading into the "program data" location, and the same numbers which were once "safe program data" are becoming "OS instructions" to be executed by the OS using its OS permissions, which the program had no original rights to do. Breaking the imaginary security boundary by exploiting the assumptions baked into some other code that is running.


I think it's more like laws (of man, not nature) than art or math. There are complex rules that define how a computer and a programming language on that computer function. Even if we assume these rules are perfectly defined and not bug-ridden the programmer still needs to understand them and write contracts (code) with no logic errors that can lead to any number of disastrous scenarios such as data-loss, data-corruption, or data-leaking among others.

Take this immense complexity of the computer and the programming language and complex it further with requirements for the business problem being solved and throw in a tight deadline and you have a recipe which leads to the vast majority of code being buggy.


Software is more like building a machine than either math or art. There've been attempts to make formally-provable programs (so it _is_ like math) but these are not in widespread use.

Go watch the Lockpicking Lawyer on youtube pick locks and trivially crack/open every lock ever made. This is, roughly, the best physical analog to what happens with computer programs and safety. The creators are trying but they have to be correct everywhere, from every angle, and the attacker only needs to find one weakness to break it.


Programming is a craft. In the same way the pentagon and white house have structural proofs against certain kinds of attacks, programs have certain kinds of defences against certain kinds of attacks.

Defences are necessary when the program interacts with users or external inputs of any form. This can be inputs in the form of text or files, or even by interfacing with a program, e.g. the malicious code executes system code in a specific manner to cause certain side effects.


In mathematics, you need to assert a set of axioms (or preconditions) under which the theorem is held to be true. These axioms can be challenging to figure out; naïve set theory was destroyed by Russel's paradox. Rather famously, the axiom of choice is equivalent (in the sense that assuming one, one can prove the other) with the well-ordering principle, and yet one is "obviously" true and the other is "obviously" false.

Euclid proved a lot of statements in geometry using several axioms, but the last one was clunky and seemed to be something that ought to be a theorem instead: this held that, given a line and a point not on that line, there was exactly one other line that was parallel to the first line passing through said point. Eventually, though, it was found that there was a reasonable interpretation of geometry where that axiom is not true, whence spherical geometry (parallel lines do not exist) and hyperbolic geometry (many lines can pass through that point and remain parallel).

Another example is in physics: the Crystallographic Restriction Theorem mathematically restricts the kind of forms that crystals could form in. And yet, in the 1980s, several crystals were demonstrated which had five-fold symmetry, which is forbidden by that theorem. The issue is that theorem presupposes that crystals need to be symmetric under linear translations, but there exist forced tilings that have rotational symmetry but not translational symmetry--and these can have five-fold symmetry. (We now call these quasicrystals).

In CS, "unsafe code" amounts to code where programmers did not assert all of the possible preconditions to their code. In contrast to much of mathematics, failing to assert all of the preconditions for safety is remarkably easy in some languages, chiefly C/C++.


When programmers call code unsafe, what they mean is the code can be unpredictable, and unpredictability leads to unintentional behaviors, which are generally bad since good behaviors are intentional. Code can be unpredictable because it's not written in a vacuum. Not only does code exist within the context of other code, it exists within the context of a compiler or interpreter, possibly a runtime, an operating system and firmware, a CPU architecture and memory and disk and networking and, of course, user input. All of these things make for an incredibly complex system, and incredibly complex systems produce emergent behaviors. So it is very difficult to work within such a system and add behaviors to it without creating unintended consequences. This is especially true the more complex the interactions of the components of the system, and c/c++ allow for very complex interactions. There's more to it than that of course, but I think that's what underlies most of it.


Imagine writing code for an elevator. If there is a glitch in your code such that when the date changes from 1999 to 2000 it'll release all the ropes... that'd cause bunch of people to die. Something like this is exceptionally unlikely, but if you're writing code for a real life device you should always always think about its implications.

Read this: https://en.wikipedia.org/wiki/Therac-25

This Radiotherapy machine had a software bug which caused 6 people to be given massive radiation overdose.


Imagine trying to assign a unique number to every bit of data your program uses, including stuff like text, pictures, etc. Such that some text that uses 100 bytes uses 100 numbers, a picture with 1,000,000 byte uses 1,000,000 numbers, etc.

You can just say "Start at number 0 and create a new number for each bit of data", but then maybe that JPEG your program uses occupies the same set of numbers as the text you're writing to. So you need to make sure it's all unique, and that each logical thing you're storing gets its own unique set of numbers. Easy enough, except data changes as your program runs, so every now and then you need to say "ok there's not enough space to store this thing, so I'm going to assign it a new number so that it doesn't conflict with this other data I have."

That works well enough, except what if parts of your software do stuff like "write value X to number 103820"? Will that do what you want? Maybe that code is responsible for updating some text somewhere, but what if that text grew too big and moved somewhere else? How do you know if the number it's writing to is actually the right text?

What's way worse, is that some of these numbers are used by the processor for bookkeeping on things like "what was the last bit of code I was executing before I ran this code?" and if you overwrite those numbers, you can cause the processor to do evil things.

That's memory safety. It's the idea that, if you just let code write to arbitrary locations in memory, it's very very difficult to do this safely. The answer ends up being to have languages that simply don't let you do that, and that's a big step towards having safe code. "Safe" languages instead only let you do things like "append to this data", which will automatically move the data to another address if it's too big. But they won't let you just write to arbitrary addresses. Even "Safer" languages ensure that one thread can't be in the middle of moving some data to a new address while another thread is trying to write to it, etc etc.

So to your question, it's very much like painting in that regard. If you start on one corner of the canvas and draw something way too big and don't leave yourself enough room, you'll paint over parts of the painting you wanted to keep. Since programs are super dynamic, the problem of making everything has enough space to be represented in a real computer, ends up being kinda hard, and the way older languages are designed can sometimes make it nearly impossible.


"unsafety" is a very overloaded term. In this context, one specific technical meaning is assumed: _memory safety_ [0] (not type safety, not safety from hacking, etc, although they do depend on memory safety).

Programming languages are tools for building abstractions and concrete implementation of abstractions. They are very rarely verified and described mathematically and exhaustively; it is possible to state some properties of some programs, but it is mathematically impossible to state any meaningful property for an arbitrary program [1].

However, it is possible to constrain used abstractions in a way that allows to uphold some properties (to some degree). Memory safety of a language means that a program in that language can change a variable/state if and only if it "touches" it semantically (e.g. via a valid pointer/reference). A memory-safe language creates a reliable "abstraction cage" for (almost) any program written in it that guarantees (but not necessarily mathematically) that unrelated variables cannot be changed. "Glitches in the Matrix" (changing one variable magically changes a random other one) are still possible, but very rare in practice. Examples: Java/Python (which incur significant inefficiency when executing a program), and recently (the safe part of) Rust, which often comes very close to C/C++ in efficiency while retaining memory safety in most of its code.

C/C++ are examples of memory unsafe languages: their memory abstractions are not even close to an "abstraction cage"/"Matrix", they are just thin "guardrails" and guides, not enforceable rules: it is easy to read/corrupt an unrelated variable in a program (sometimes even via a malicious input to a program). This design choice was semi-deliberate: C/C++ solve the task of programming existing computer hardware efficiently and nobody knew how to create a practical, efficient and memory-safe systems programming language even twenty years ago. It is possible for a coder to code "defensively", using empirical best practices and tools for reducing possibility of using program memory incorrectly. C++ has a subset and tooling that comes tantalizingly close to memory safety, but it is still a costly uphill battle and even the best C/C++ coders/organizations fail to avoid memory misuse.

[0]: https://en.wikipedia.org/wiki/Memory_safety [1]: https://en.wikipedia.org/wiki/Rice%27s_theorem


Simplest example I can think of:

Your maths function takes a variable and divides by that variable. What happens if that variable is set to zero?



But can you write "consistently safe" (whatever that may mean) programs in any other language?


Obviously, this depends on your definition of "safe".

But there's a fairly large set of "safety" issues that, in 2021, effectively only C and C++ have. Other than straight assembler, which by its nature will always be with us in some sense, there aren't any other languages in common use anymore with similar memory safety issues. I'm not sure I can think of any other languages with "pointer arithmetic". Almost nobody else is using NULL-terminated strings. And so on and so on.

(Please read that carefully. I'm not saying every language other than C and C++ are completely safe by any definition. I'm saying there are significant weaknesses that only C and C++ have nowadays. Threads unsafety, mutable state management issues, bad/unhelpful type systems, plenty of common unsafety out there today, but C/C++ have their own nearly-unique entries on that list.)

C++ nominally has solutions to any given weakness, but in practice they're at the very least difficult to use in isolation, and very, very complicated to completely correctly use in combination with each other, to say nothing of code bases that inevitably end up having to deal with multiple solutions intersecting in the same code base because of two important libraries that have to do things differently or whatever.


C is assembly on steroids which gives you just the essential features of an HLL, without any of the fancy stuff which would complicate the translation and would potentially require a runtime environment.

Any C construct/operation has a canonical representation as a short series of instructions found on most machines. Essentially, what you get is expression-oriented syntax (operations are expressions yielding a result), support for structured programming (loops, conditionals), automatic register allocation and stack frame management through named local variables, abstraction for calling conventions and a rudimentary type system around integers and pointers.


"issues that, in 2021, effectively only C and C++ have"

"I'm not sure I can think of any other languages with «pointer arithmetic». Almost nobody else is using NULL-terminated strings. And so on and so on."

The thing is, at least a while ago, these were features that got C and C++ chosen for, precisely because they enable someone to go do things quick and dirty (mostly for "quick", with "dirty" only as an assumed consequence, but I've also seen both C and C++ for "dirty" alone plenty of times). We had safer languages (the one named after lovely Mrs. Lovelace, for instance), yet those options simply could not persuade the people that like it raw. Now we clearly have a stronger techno-political pressure lobbying for wearing straighter jackets, so at least it's interesting to watch.


There are lots of languages that are not "safe" in your terms. I'll limit them to ones I am fairly expert in - assembler, FORTH, Object Pascal, but there are many, many others.


FORTH and Object Pascal are not in common use.

Languages actually 100% dying are rare. I'm very, very confident that there are people out there working in them full time, and making a good living.

But they are not even remotely in the same tier as Java, Python, Go, C++, C, Objective C, etc.

Again, this depends on your definition of "common", but... can anyone tell me with a straight face that Java and FORTH are in the same class of usage?

Also, I already called out assembler as an exception. By its nature it will always be unsafe. This is fine by me, because any restriction that it lays down will be something that no language above it can possibly get around, no matter how good an idea that may become in a future, which is dangerously restrictive over the long term. I don't look to assembler to provide language-level safety. I don't want assembler to, say, rigidly protect the "private" fields of objects from exterior access, because then my debuggers and state viewers and other tools become impossible.


Object Pascal is definitly safer.

- bounds checking

- proper strings

- strong typed enumerations

- parameter references cannot be forced to nil (C++ ones can be tricked into null and C doesn't have them anyway)


Is there a spec for Object Pascal?

I'm wondering what document I should follow, if I wanted to write a compiler for it.

UPDATE: From what I see[1], there is a proposed standard, but no finalized one.

[1]: <http://pascal-central.com/standards.html>


Should be a bit easier with Rust or Ada I think.


Has this changed in 3 years? Was the author really writing about modern C++, or C++ as he remembers it? One of the comments on that post is pointing out the introduction of std::array with C++11.


Strictly speaking, it can never change. The features that create safety pitfalls are still there, and they can never be removed without breaking backward compatibility. And you can't simply avoid using those features, because their use is baked into the standard library, or into some other de facto standard library that you can't realistically live without.


Plethora of static analysis tools make it possible to write code that's "safe enough" I would imagine, for some value of "safe enough"


Now if people would actually use them.

> Which of the following tools do you or your team use for guideline enforcement or other code quality or analysis?

https://www.jetbrains.com/lp/devecosystem-2020/cpp/

With the best value being 36%.


Then why have we seen ITW exploits against Chrome, or Linux? These are C and C++ codebases that undergo tons of static analysis and testing - tons of research goes into both of those projects to make them safer.

Still vulns. Still exploits.


Don't forget Sanitizers combined with comprehensive test suits. I would always recommend doing both, static and dynamic analysis.


> I cannot consistently write safe C/C++ code.

There's a pretty big ecosystem of tools and techniques beyond just the compiler for writing safe code. Unfortunately, they all have usability issues, so the burden of writing safe programs often falls almost entirely on the individual.


Valgrind helps a lot though, it usually solves my memory errors. (If that doesn't do it, then it's either just a logic bug, or you have a really strange bug in the first place!) The main issue people have with it is that its terminal interface is almost always unusable (you get thousands of lines of gibberish text), and there aren't many good Valgrind GUI frontends available. Thankfully CLion has an integrated valgrind inspector that really helps a lot.


Valgrind is good but it's a dynamic tool, so it won't catch anything you don't exercise. There are better options, but there are usually some extra barriers to using them.

By usability I also meant getting it to work for your setup and learning how to use it.



I'm guessing this post is another low-key Rust promotion.


There are plenty of safe alternatives.


I have found that most people who classify themselves as a “C/C++ programmer” are incompetent at both C and C++.

There are many competent C and there are competent C++ programmers, but people who try to be C/C++ programmers end up with the worst of both worlds. They lose the simplicity of C without going far enough to gain the full benefits of C++.


I've had to ping pong between C and C++17 on some projects where I've written Linux kernel modules, and then userland C++ code to exercise said modules.

I find it very hard to transition back to a C mindset coming from modern C++, not so much the other way around. Going from kernel C to userland modern C++ is akin to getting that first gasp of fresh air after nearly drowning.

Granted, that _could_ be a function of the "complexity" of writing kernel code, in that it can take up a lot of mental real estate, and there's less "risk" involved with userspace code...


But would you call yourself a C/C++ programmer? Or a C programmer and C++ programmer?


In English / is an abbreviation for and.

An abbreviation widely accepted by several institutions with a saying in programming language standards, only people in forums get uptight about writing C/C++.


I would call myself a programmer. No language prefix.

Languages are tools.


This is completely misguided. People who call themselves C/C++ Programmers are basically developers who are proficient in Systems Programming and not necessarily C or C++. It just happens to be that C and C++ were the only primetime options until the likes of Go and Rust entered the scene.

Also if you know C++ to a decent enough extent, you are going to have a certain level of command over C too. And there is no reason why calling yourself a C/C++ Programmer should undermine your proficiency in either of those languages.

Broad generalizations are generally bad.


Thankfully no ISO C++ official documents don't mention C/C++ anywhere, neither has Bjarne ever written such thing.


I agree that people that think there is a language called C/C++ probably don't have much of a clue, but it is certainly possible to program in both C and C++ and do a good job in both.


In my perception there is a group of considerable size who thinks that C++ is largely b*s*, but still uses C++ (technically) for one reason or another (availability of compilers, interfacing with existing ecosystems, availability of jobs) - in a way that is basically C with almost no C++ features.

This is a group of people that considers themselves systems programmers in the first place, and doesn't really care about the language very much, except that they don't want to have to deal with stuff like this: https://twitter.com/fabynou/status/784905829866614784

I don't see any reasons for the people in this group to not consider them "C/C++ programmers". Au contraire, I've found the people that like to point out that "C/C++ isn't a language" to be annoying nit-pickers that care too much about the language and too little about systems programming.


The common subset of C and C++ is different enough from "idiomatic" C and C++ that it almost can be called a separate "C/C++" language ;)


The common subset should probably be called something like "better C" - it won't be able to use any major (or many minor) C++ features. Basically, it will be ANSI C (give or take). Noticeably, the code for the 2nd Ed of The C Programming Language was tested using a C++ compiler, not a C compiler


The common C/C++ subset is an outdated and non-standard C though. For instance standard-conforming C++ compilers cannot compile C99 code and onward, they're stuck at a non-standard C version that would roughly be C95, minus some things that are valid C but not valid C++.


They can compile more C11 code than C99, thanks to the stuff dropped in C11.

They also support most of the C standard library additions.

As of C++20, the biggest difference is that designated initializers must follow field declaration order, restrict and _Generic aren't supported, and the type safety semantics are stronger.


There's also a handful of small differences like this, e.g. in C it's possible to construct a value and take the address for a pointer argument right in the function call:

    func(&(bla_t){ .x=1, .y=2 });
This is completely valid C, but invalid C++ (C++ has references as "workaround" for such adhoc-constructed parameters which shouldn't be passed by value).


Yes, that falls under my "type safety semantics are stronger" part of the comment.


I guess FANNG are full of such programmers given that they use C/C++ everywhere on their docs.


I don't call myself a C/C++ programmer but I have done both professionally, am productive in both, like them both.

I understand and appreciate where idioms differ and that there are pros and cons to each.


"C/C++" is a valid term for the common subset of C and C++. Even if both are different languages, it's possible to write code which is both valid C and valid C++ (but is a subset of both languages). Usually the goal when writing "C/C++" is to write C code that also compiles in C++ compilers (and this, contrary to popular believe, is a very restricted subset of C, and is almost a 3rd language).

Also, what else would you call yourself if you write both lots of C++ code (for the job) and C code (for fun) ;)


I would call myself a C and a C++ programmer. If I also wrote Java (which I do), I would call myself a C, C++ and Java programmer.


Rust looks like Facebook, while C++ looks like every publishing system out there (Wordpress, Django, Drupal, MediaWiki, static sites, etc).

You're a lot more restricted on FB but you also can't host your online store, or really anything except: messages, groups, events, photos and videos.

C++ could probably be more regulated for certain scenarios, but if people don't paint self-hosted blogs as evil, why paint C++ that way ?


That's really hyperbolic, I don't believe there's anything you can do in C++ that you can't do in Rust.

Your argument seems a lot more valid for Python, Java, as AFAIK in those languages you can't access raw memory.

Do you have a specific example?


> I don't believe there's anything you can do in C++ that you can't do in Rust.

I think you'll have to make "can do" more precise for this question to have a meaningful answer. All mentioned languages are Turing complete and thus equivalent on that level.

> you can't access raw memory.

I can wrap a file in a ByteBuffer in java and the JVM will (make a best effort to) perform native I/O on that. Does that pass as "access raw memory"? If not, you'd have to explain how this is different from the abstract machine that C++ is defined on.

Sure, in Java you can't escape the garbage collector, but in C++ you also formally invoke UB if you violate the object lifetime requirements (even though will generally work due to this part of the standard still being... work in progress, let's say).


> I think you'll have to make "can do" more precise for this question to have a meaningful answer. All mentioned languages are Turing complete and thus equivalent on that level.

I actually agree, I was just arguing with the point of the above comment which implicated you can't do lots of things in Rust that you can in C++. I don't know what the author was referring to, so I can't be more specific.

> I can wrap a file in a ByteBuffer in java and the JVM will (make a best effort to) perform native I/O on that. Does that pass as "access raw memory"?

I was referring to directly accessing the memory "owned" by local variables (the stack), the bytecode of the program itself (on von Neumann architecture), etc. It was a counterpoint in that implication that every other language (akin to publishing systems) had no limitations where Rust had. But I don't think that's actually a problem as (if that's ever needed) you can FFI to another language that can do that stuff and (as you said) all those languages are turing complete.

Sorry if it sounded like I was attacking Java & Python, for most applications I don't think the "runtime" limitation isn't a problem and comparing them to facebook is still hyperbolic.


C++ could probably be more regulated for certain scenarios

At that point, a different language might be better anyway.

My experiences with C++ (I dust it off every year because of something that needs doing that nobody else wants to touch) are that I've usually dipped into C++ specifically for some combination of:

1. Invoking API's that have been around for a long time, usually meaning OS API's

2. Hand-optimized performance sensitive code where, for example, you're auditing every runtime memory allocation that has to be done in the inner loops

3. Modifying legacy code that's written in C++

In all of those cases you have to dip into unsafe territory regularly, so you end up with pretty much what Rust does with safe/unsafe, but with less in the way of the language helping you to ferret out your memory handling bugs. Every time I do it, even if I don't think I've created memory handling bugs, and even when my tools don't catch anything, I always leave with a low grade fear that my code will end up in a CVE somewhere.


Your analogy makes absolutely no sense.

There are operating systems written in rust. If that is possible, then anything else is possible too.


Could you explain what you mean in more concrete terms? In what way does Rust restrict the programmer so much that it's comparable to the difference between an FB page and a site-building system?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: