Hacker News new | past | comments | ask | show | jobs | submit login
Why is the DOS path character "\"? (2005) (archive.org)
182 points by jez on Feb 26, 2021 | hide | past | favorite | 158 comments



Thanks to Microsoft's recent release of the MS-DOS 2.0 source code, we can now peek under the hood and confirm that Microsoft specifically intended for the DOS 2.0 file APIs to be compatible with Unix. From XENIX.ASM [1], the code that implements the new API:

    ;
    ; xenix file calls for MSDOS
    ;
    
    TITLE   XENIX - IO system to mimic UNIX
And the CONFIG.DOC [2] file discusses the 'AVAILDEV' option which lets the system mimic Unix even more:

    AVAILDEV = <TRUE or FALSE>
        The  default  is  TRUE which means both /dev/<dev> and
        <dev> will reference the device <dev>.   If  FALSE  is
        selected,  only  /dev/<dev>  refers  to  device <dev>,
        <dev> by itself means a file in the current  directory
        with the same name as one of the devices.
Finally, an example CONFIG.SYS file from the same document:

    A typical configuration file might look like this:
    
    BUFFERS = 10
    FILES = 10
    DEVICE = /bin/network.sys
    BREAK = ON
    SWITCHAR = -
    SHELL = a:/bin/command.com a:/bin -p
I think it's pretty clear how Microsoft intended for MS-DOS to be configured, but alas IBM had other ideas...

[1] https://github.com/microsoft/MS-DOS/blob/master/v2.0/source/...

[2] https://github.com/microsoft/MS-DOS/blob/master/v2.0/bin/CON...


I really look forward to the day NT's source is released. It's a truly fascinating kernel. Imagine somebody building an entire Unix on top of NT! Like WSL, but even beyond.


NT originally had a Unix subsystem: https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem


Yeah, and WSL1 was something similar, but I was suggesting just Unix here, no Win32. Imagine GNU/NT!


If someone wanted to attempt this absolute bastardisation of God's will, you could try porting the GNU userland to run atop the ReactOS kernel.

We'd finally have the GPL-licenced microkernel OS of Stallman's dreams.


Apart from fork() and custom async io (epoll, iocp) aren’t they pretty compatible even without cygwin? GNU relies on POSIX (afair), so wouldn’t NT features be unused extra?


Win2k and WinXP source code are leaked if I am not mistaken


Release != leak


Even the leaks contain enough of Microsoft's spirit in their comments: https://gist.github.com/turbo/75f0905275c29a3049f983cfe273ea...

;)


But sadly a lot of interesting fun and research that could be done with this code won't ever see the light of day because lawyers are scary.


What special about NT kernel that's not in linux kernel ?


* It is (with a lot of ifs and buts) a microkernel

* it is optimized for integration with third party software from people who don’t have the source, so for instance the driver model is interesting

* the built in configuration system (the registry) and how it’s used throughout

* the (underused) personalities system you can use to show different apis to different binaries

* the security model is much more interesting, while in Linux you have ‘root’ and everything else, on Windows this is much more granular (unfortunately it’s so complex it’s basically impossible to use).

The architecture is really quite interesting, even though Microsoft didn’t make a lot of use of a large part of it.


If I'm not mistaken, the Win32 API is actually a subsystem to the NT kernel. You can call the kernel itself a layer below with functions beginning with 'nt*'.

Hardware is actually mapped as an object namespace, which is presented to the user as the drive letters. This was exposed with Windows XP booting in safe mode; during the boot process it would print file paths as object paths, not drive paths.

Much as I'm more in the *nix way now, there's plenty of curiosities to explore and tinker in Windows.


>in Linux you have ‘root’ and everything else

Well, you do have:

- capabilities (which are so coarse-grained as to be practically useless)

- 8 types of namespaces

- seccomp-bpf

- LSM (AppArmor, SELinux, TOMOYO, etc)


The big difference is when you're in a multi-computer (Active Directory / NIS / LDAP) environment. On UNIX all the IDs are smallish integers, so you have to be careful to ensure they're unique and non-overlapping. On Windows you have a "SID" which is variable length and (for users) usually a big random number.

https://docs.microsoft.com/en-us/troubleshoot/windows-server...

Windows also differentiates between the human ADMINISTRATOR account and machine "root" accounts like "LOCALSYSTEM".

User accounts are also disambiguated by "domain"; ADMINISTRATOR on the local machine is not automatically the same as the domain-wide ADMINISTRATOR.


Linux User IDs since decades are 32 bit integers; you can just use some mapping system to allocate them automatically and you’ll never run out.

The limitation is that there is one user ID, 0 which can do everything and all the other IDs can do almost nothing.

This has nothing to do with domains and everything with the distinction you describe between the Windows Administrator, local system or even more powerful trustedinstaller accounts.


Compared to Windows SIDs, Linux's 32-bit uid is a "smallish integer"


Yes and no. Bear in mind, a large part of a Windows SID is a namespace - the actual id within that namespace is so far without exception under 32-bits. An entire Active Directory domain (read: single domain, not forest) is actually limited to 2^30 RID's being issued - after which no new accounts (including computer accounts) can be created, period. You can technically unlock an extra bit and issue 2^31 RID's starting with WS2012, but compatibility is a potential issue and MS's documentation says you should only use it while planning a migration to a new domain (and for good reason).

This does technically give Windows some advantage here as SID's are namespaced - you can have multiple domains in a forest, domain trusts, etc - but I don't think as far as realistic number of users accessing a network it makes much of a difference.

Where it does suck on Linux, however, is user namespaces. 32-bits is a lot when it comes to just giving out accounts, but it's nowhere near enough to give every user a 16-bit chunk of accounts for mapping the traditional 0-65535 (because nobody) ranges for use with unprivileged user namespaces. I'd really like to see a push for 64-bit uid/gid's for this reason.


And yet in practice the only problem with them is when mapping Windows SID is needed. Otherwise, they are fine.

Also, Windows SIDs are fixed-size 128 bit. They were supposed to be GUIDs, but they are not that random; user SIDs contain common prefix from the domain SID.


Yes though a lot of this came "long after" the NT kernel (at least after NT 4.0)

A lot of innovations on NT are late as well to be fair, like the whole Application Views (?) of the system (basically an FS/Registry app sandboxing)


This is more like LD_PRELOAD. I’m not sure this is a kernel level feature.


> the (underused) personalities system you can use to show different apis to different binaries

Linux has this, but it is, as you say, underused.

https://man7.org/linux/man-pages/man2/personality.2.html


The architecture is really quite interesting, even though Microsoft didn’t make a lot of use of a large part of it.

Sadly, that usually means it’s bugged as hell and impossible to use unless you’ve been trained for many years at ms to do that.


I'm very intrigued by that list. Any recommendations on where to read technical, internal details?


Windows Internals 7th edition


There’s books but also the ReactOS project has been working for many years on replicating the functionality, so you can look at their source.


Object based instead of file handles, with capabilities on everything you can do with them.

Since Vista, official support for a C++ subset.

In kernel IPC, namely LPC, to mimic micro-kernels architectures.


It's a whole different paradigm and a popular one too. Just being able to look into it and get inspired is enough to be excited TBH.



Almost every time I talked to pseudo nt gurus, IOCP was mentioned as something nt did and linux don't. Now, io_uring seems to have finally closed the gap.


io_uring is more like RIO, which is better than IOCP if your packet rates are high.


You mean like epoll on Linux?


No, like io_uring which Linux got in 2019 and still doesn't cover everything yet (e.g. currently mkdirat() is being added). It also often falls back to a kernel-level thread pool, since many e.g. file systems don't implement async IO.


Annoyingly IOCP didn’t cover everything as well - for example, CreateFile of all things!


>since many e.g. file systems don't implement async IO

If we're picking on Linux here we should also mention that this issue cannot exist on Windows because it doesn't support anything else besides NTFS and a couple of options dating back to the 18th century or so.

Oh, and WinFS, of course, F being short for future which is like the horizon, or the communism, always out there just a month or two away.


You can actually for example get drivers for APFS for Windows from third party vendors.


Sure, and there are third party drivers for ext* and btrfs and god knows what else… but we're talking about official support here, I think. You can find all sorts of craziness in out-of-tree patches for the Linux kernel.


They also have ReFS - new filesystem in NT Server.


My os book in college had an appendies on various OS.

Online the oldest is the 7th edition appendies which has details on A)Unix bsd B)Mach and c)windows 2000.

Windows 2000 was the next version of NT. Interesting if long reads about OS architecture.

http://bcs.wiley.com/he-bcs/Books?action=resource&itemId=047...

“ n the mid-1980s, Microsoft and IBM cooperated to develop the OS/2 operating system, which was written in assembly language for single-processor Intel 80286 systems. In 1988, Microsoft decided to make a fresh start and to develop a “new technology” (or NT) portable operating system that supported both the OS/2 and POSIX application programming interfaces (APIs). In October 1988, Dave Cutler, the architect of the DEC VAX/VMS operating system, was hired and given the charter of building this new operating system.”


Is the fact that it's a very stable and robust kernel that evolved over different means, with different priorities and expectations not enough?

The kernel itself is very interesting, regardless of what happened in the Linux world. If all you ever look at is Linux, you get stuck in Linux ideas of how things should, or even can, be done.


Proper support for paged kernel memory? Async-by-default I/O? Must better power management (compare win vs linux battery life on same hardware).


The latter is definitely because the hardware manufacturers QA with it.


> Async-by-default I/O?

What? Can you elaborate? I mean if you want non blocking IO from an fd in Linux, you can just do that. Not sure what defaults have to do with anything. Your code will still have to be written appropriately.


Default in linux is read()/write() and be blocked. Doing async is a lot more work and until relatively (to NT timelines) recently quite limited (select).

On NT the standard way is async, and doing things in the block-and-wait way is abnormal and unusual.

Defaults matter. Because that's what most people will do.


Linux use read(2)/write(2) for both blocking and non-blocking. Still don't understand what you mean by how NT does it. Either your code is written to accomodate Async IO or not.


Bad power management is really manufacturers fault. See android phones or chromebooks that can last days without recharging.


Isn't that what SFU was?


Well, there is this different timeline in the universe where Microsoft decides to keep selling Xenix instead of focusing on MS-DOS.


... and where it is not successful.


That we will never know, Xenix was the most successful PC UNIX clone after all, Microsoft just decided UNIX wasn't the future of PCs.

And they were kind of right, the Year of Desktop UNIX/POSIX is WSL, Android, ChromeOS, macOS.


In addition to its use as a path separator in DOS and Windows, the backslash character itself is also interesting because it is very likely a modern invention, with the first attestation in 1940s (!). Its original use in ASCII (1960s) was for ALGOL operator digraphs `\/` and `/\`. Its early use as the C escape sequence (1970s, replacing `*` in BCPL) suggests that it carried no significant semantics at that time.


C also has trigraphs, in case your keyboard does not have a \ you can type ??/ instead.

https://en.wikibooks.org/wiki/C_Programming/C_trigraph


Mind blown. C is truly a terrifying language and I can’t help but love it for that.


It was designed in a time where keyboards around the world (German QWERTZ, Cyrillic JCKUEN/ЙЦУКЕН) and text encoding (remember that this is pre-Unicode, so we're dealing with ISO 646 and Eastern Asian character sets) has only the subset of Latin characters used in the US. Nowadays, it is strongly recommended to simply use the standard American keyboard in programming (outside of comments).


I still cannot decide whether it's better to comment in English or in my native language. English works, but it's a weird fit with words from the application domain. Those, I don't really want to translate, but I have to keep doing it because there is so much legacy code with the translations. Sure, I could do a rampage through the code base and fix everything up (would be done in an afternoon thanks to IntelliJ), but my team would almost certainly give me lots of flak for this. At the same time, it also feels weird to conjugate english words in my native language.


Apparently there are several digraphs too. Convinced this was secretly just designed to make IOCCC contests more interesting.


I wonder how many regex security filters would break using ??/ for escape because very few people know this exists.


One of the comments on that blog post says the backslash was invented by Bob Bemer at the time he developed ASCII in the 1960s, also linking to a page that supports the claim:

https://web.archive.org/web/20100612011533/http://thocp.net/...


I briefly checked comments and missed that. ;-) I would be more cautious to say that he invented backslashes, as it's unknown that he was aware of earlier uses of backslashes. (I've checked both the page and its citations and they gave no more information.)


> I would be more cautious to say that he invented backslashes

You're right of course, I was just relaying what the post there said. There was prior usage as early as in the 1940s as you mentioned. According to Wikipedia, [1] its origins are unknown:

> As of January 2021, Wikipedia editors have not been able to find the origin of this character nor even the purposes to which it was put before the 1960s. The earliest known reference found to date is a 1945 bulletin from the Teletype Corporation that lists it as a replaceable part for its Wheatstone perforator.

On the other hand, I think it's safe to say it would have remained an obscure character had it not been for Bob Bemer.

1. https://en.wikipedia.org/wiki/Backslash#History


I feel like some people must have written slashes in whatever direction long before, but it was never formalized until then.


While it was included in early character sets it just wasn't used much - largely because the ubiquitous 029 card punch had no \ key (or [] ... one had to learn the multipunches)


Do you correct people when they say “back slash” when reading out a url? When it’s actually a forward slash?

Is there a name for this phenomenon? Everyone knows it’s a slash except when its used for something computery. Then it somehow becomes a backslash.


I have to suppress the urge to correct people both when they refer to it as a backslash and when they use the term "forward slash". I've already lost all my friends by being that guy; I don't need to also earn the enmity of random strangers on the internet.

But technically it's a solidus not a slash.


Nothing is wrong with "forward slash.


"forward slash" vs. "forward backslash" would be more consistent.


I don't have an answer but it's exactly the same phenomenon as when people refer to '#' as a hash-tag.


Exactly. Everyone knows it's an octothorp.


Just don't call it "pound" when standing in Britain. :)


I live in Britain and don't have an issue with it, '#' is originally a symbol for pounds, a shorthand for 'lb', and we are used to the idea that pounds can mean weight, as well as our currency, so why not this symbol as well?

It's believed the 'pound' in pound sterling came from a pound of silver or silver coins in weight originally as well.


i've just started calling it "shift 3". that seems to be the least confusing way to communicate the character haha


Is it shift-3 on all keyboards? Maybe it is, but I've seen other keyboard layoutss where symbols are in different locations.


That will usually work in the US, but Shift-3 returns £ on keyboards configured for the UK!


Is that related in any way to the fact that both symbols can be referred to as a 'pound' symbol, or is it a complete coincidence?


I believe it is related to them both being called 'pound', and it is very annoying. The two are not the same symbol, I don't know why they equivocated them. Technically in a distant root they are related, but they are distinct in usage and meaning.


I'm content because that character finally has a name that people will recognize.

But confusing the name of the two slashes (how come?) has the exact opposite effect, so I don't like it.


That's not the same at all. Hash-tag is just a name for that symbol. It's not the original name, but it is a name. "/" is slash! "\" is a backslash! People who do not type windows paths have probably never actually encountered a backslash in their entire lives, but when they see a url they think that the symbol they see all the time in other contexts (/) suddenly has a different name!


A 'hash-tag' is a feature of internet apps like Twitter where you put a '#' in front of some topic name to relate your content with similar content. It's named after the character which is known as a 'hash' among other things.

Or are you saying 'hash-tag' is the name because although it's a mistake it's used so much now it's considered language?

Language isn't one big blob, even though among many people it could now be considered an alternative pronunciation, and eventually it could be adopted even in places where people would otherwise know better, right now among people in tech and certainly on this site 'hash-tag' to mean the character is incorrect and confusing.


But there isn’t another character that anyone calls “hashtag”. It’s like if everyone nontech called “:” a “semicolon” only when it’s preceeded by “http”. There is something else called semicolon and its not the thing in the url.


I remember it being the number sign when first learning to type in school. I then heard it refered to as pound sign. And once I started into the dev world, it became the ubiqutous comment. Then my favorite became the shebang when paired with the friendly bang/exclamation/pling.


What you mean the octothorpe?!

There are many valid names and usages for this symbol: https://en.wikipedia.org/wiki/Number_sign#Names_of_the_chara...


Or you can raise musicians' blood pressure by calling it a sharp-sign.


everyone knows it's called 'tictactoe'


Contextually, yes. Sometimes it matters, sometimes it doesn't. Same thing goes for the dash/underscore distinction. I imagine there must be similar grappling in the editor world between usages of dash vs. hyphen


In 2007 I joined a company that sold a derivatives trading platform and ran VMS on the core component, the order routing engine that dispatched trading instructions to the respective exchange.

It was really weird working on it at first after having used both Unix systems and DOS for so long. I was very familiar with DOS and even used CP/M way back in the day when I was a teenager (I'm that old), VMS was a weird amalgam of a pretty advanced OS with a very capable command line environment like unix, but with a lot of conventions that felt familiar from DOS. I was a ware of some of the history of VMS influence on DOS so it was fascinating to work on and I quite enjoyed it even though it was clearly a dead end at that point.


It was an unfortunate choice because \ is one of the 12 characters that vary between country versions of ISO 646 (of which ASCII was the US profile). This was why Japanese MS-DOS used ¥ instead of \ for the directory separator: it occupied the same code point in the Japanese profile of ISO 646 that \ did in ASCII, and Shift-JIS encoded that as a single byte with the high bit clear.

The complete list of the 12 was: # $ @ [ ] \ ^ ` { } | ~. Notice that / is safe.

They avoided the problem for all the Western languages by inventing their own 8-bit code (this was before the ISO 8859 standards) and always using ASCII in the lower half.


It bothered me as a kid that the (ROM) font used in DOS had different thickness for slash and backslash… it wasn’t just a mirrored glyph.

https://int10h.org/oldschool-pc-fonts/fontlist/font?ibm_ega_...


Me too, particularly since some of my favorite programs (various DOS games, especially MegaZeux) used text mode for their graphics


The person that made this decision was the author of the original DOS - Al (Allan) Alcorn. I had a conversation with him nearly 2 decades ago about this very subject. I remarked to him how using the "opposite slash" caused grief for untold numbers of developers. His reply was "yeah, I know. It was a poor decision, and I remember it clearly. I was trying to make DOS a bit unique, and that was all. It was stupid, in retrospect." From the author's mouth.


Funny, i've always imagined it was chosen only because it wasn't a regular slash like other systems. I never mentioned it though because i figured I was wrong.


All wrong. "/" was the options-switch-character in CP/M, which had no sub-directories. DECs and VAXes had nothing to do this decision, because only few humans had access to such machines. I even remember I had problems adapting to directory trees in MSDOS because files were kinda hidden and lost in a floppy directory.


It's true that DOS was originally a CP/M clone, but CP/M didn't have a defined standard option switch character, and in fact many command option switches weren't preceded by a switch indicator or differed as to what option switch character was used. If anything + and - were the most common prefixes for an option switch.

http://www.cpm.z80.de/manuals/cpm3-cmd.pdf

The article is quite right that the convention of using / for command line switches came from IBM.


CP/M was surely the direct influence. Wikipedia says CP/M was itself influenced by TOPS-10, which I think also used the slash for options, so the ultimate origin may be with DEC.


CP/M had them because of DEC, IIRC. Commands like "DIR" and "TYPE" are also indicative. And "PIP".


Can anyone explain why was that a problem, from technical point of view? They already had paths starting with "driveletter:", not like unix with just "/". Why would it be a problem for parser to distinguish between filepaths and arguments switch?


Because paths don't have to start with a drive letter – a path like \foo\bar is relative to the current drive.

"DIR \W" does a DIR of the "\W" directory on the current drive.

"DIR /W" does a DIR listing in wide format.


Even worse, there was (and still is) no requirement for a command and its switches to be separated with whitespace.

"DIR/W" is the same as "DIR /W".

Which would have made it impossible to determine whether you want to invoke the command "DIR" in the current directory, or the command "W" in a subdirectory named "DIR".


The DEC operating systems (eg. RSX with its MCR shell and VMS with its DCL shell) also handled command-line switches this way (eg. PIP/LIST to list files in a directory).

Of course, they didn't use unixy paths for directories. A fully-qualified file name would be something like (and it's been decades so if I get it wrong forgive me) DRA0:[SYS.USERS.BREGMA.PROJECT.SOURCES]HELLO.C;1 and anyone who was sane would use logical names in DCL to make things readable.


I see, didn't know that


Yes, and this is absolutely maddening when trying to do cross-platform work because some programs support / in filenames while others interpret them as arguments.

(Another key difference is that on UNIX the shell expands '*' before passing it to a program, but CMD doesn't so each program has to do its own globbing)


Pretty much all Windows programs support / in filenames. The main issue is some cmd builtins like mkdir/cd/del which have weird parsing rules where every / is interpreted as starting a command-line switch, even if not preceded by a space. But even there, you only need to use quotes (mkdir "c:/test") to suppress the interpretation as a switch and then you can use paths with slashes just fine in batch files.


Paths can drop the drive letter e.g. del \file will try to delete a file in the root of the current drive.

On modern Windows you can use slash, but you have to quote the argument e.g. dir "C:/windows"

You can even mix both types together e.g. dir "C:/windows\system32" which is convenient when using code modules that only output unix style paths. No need to clean them up.


Personally I find unix or whatever OS chose / as the bad choice. / is a commonly used character, at least in the USA as dates. It would completely normal for someone to want to name a file "Meeting 12/20/1980.txt" or "Budget Sep/12/1985.doc"

Backslash has no common use I know of outside of computer related stuff like regular expressions and escaping things.

Some of you might also have forgotten but on Mac pre OS-X, at least in standard Mac devtools provided by Apple the separator was colon :


RM/COS used . as a path separator. I think this is from OS/360. It makes sense to me in that member selection in C also uses '.'.

Yeah, from the OS/360 wiki: "The file naming system allows files to be managed as hierarchies with at most 8 character names at each level, e.g. PROJECT.USER.FILENAME. This is tied to the implementation of the system catalog (SYSCTLG) and Control Volumes (CVOLs), which used records with 8 byte keys."


My son used to use \ for dates when he was learning to write; to the point I wondered if he was somewhere above 0 on the dyslexia scale. He's 20 now, and I think has grown out of it.


Backslash has no non-computer use since it was invented for ASCII to represent ‘∧’ and ‘∨’ as ‘/\’ and ‘\/’.


why post this wayback version?

Post is still up here: https://docs.microsoft.com/en-us/archive/blogs/larryosterman...

and anyways, (2005).... here's plenty of other discussion from one of the previous posts:

https://news.ycombinator.com/item?id=3723355


>*nix defines hierarchical paths with a simple hierarchy rooted at "/" - in *nix's naming hierarchy, there's no way of differentiating between files and directories, etc (this isn't bad, btw, it just is).

I don't see the problem, but I guess it's just a personal preference.


>this isn't bad,btw,it just is


During the time DOS 2.0 was in development, I visited Microsoft with a group of colleagues. (We were touring computer manufactures both in Seattle and in Silicon Valley (where we visited Digital Research, Intel and many others—in those days it was comparatively easy to visit these enterprises for a tour.)

We were at Microsoft quite some hours (and we seemed important enough to be fed lunch which consisted of very good sandwiches). One of the people who toured us around was a DOS developer (who in his 'spare' time also contributed to the Flight Simulator development). He spent considerable time discussing the new DOS subdirectory matter as it was a hot topic back then. Anyway, he received somewhat of a tongue-lashing from us about the backslash 'problem' and it was very clear to us that he too was not in favor of it although for obvious reasons he chose his words carefully.

This brings me to more ergonomic problems that Microsoft has never bothered to solve with its operating systems—DOS or Windows. The first I'll mention is the annoying reserved character problem, specifically: < > : " / \ | ? * cannot be used in a filename. I'm aware these characters are also deemed illegal in the filenames of other operating systems but I fail to see why after about 30 years that we still have to worry about avoiding them. If Microsoft had fixed the problem back then, then there would have been pressure for other operating system developers to also fix the problem. Just because other operating systems were behind the times, it didn't mean Microsoft had to be—after all, in the early days, Microsoft went to considerable trouble to please users in the useability stakes, even to the extent that it put security severely at risk in the process.

I fail to see why Microsoft couldn't have coded around this problem and allowed the use of these characters. It went part of the way by allowing spaces within filenames in Windows and it also allowed spaces to be entered into the command line filenames with quotes "My first Name.doc". The fact that these characters cannot be used has caused considerable trouble for IT staff over the years.

It'd hate to think how many thousands of hours have been wasted by both users and IT staff over the past three decades or so on what ought to have been a trivial matter to fix. Similarly, I hate to think how many times I've had to enter a ¿ into a filename just because the damn operating system will not let me enter normal question mark: ?.

Another major stuff-up is the maximum filename length/max path length of 254/255 - 260 when the path length could be potentially 32,767 characters—as it already calculates the path to this length internally (the exact length varies between O/S versions). These days, this limit is ridiculous. If, say, you have a file with a filename of say 245 characters long in directory \MyFiles then move the directory way down deep into nested directories then one automatically has a problem that one's not necessarily aware of until a cannot-continue crash occurs during a backup. Having to regularly run a Max-Path-Length utility across the disk to search for potential problems is a damn nuisance and it ought to be completely unnecessary.

Same problem occurs when saving web pages with long names, these often exceed the maximum filename length and the page cannot be saved without manual intervention. To say ≈255 characters for a filename is long enough is just not being realistic these days. Here's another instance: say one wants to save a book with a long title from the Internet Archive and to avoid confusion later over having a cryptic filename one adds the book's title to the already-cryptic IA filename, i.e.:

Books.<…>.with_very_long_names_are_common_on_the_IA_+_the_Internet_Archive_filename_abcxzy123.pdf

Many a time I've had the title combined with the IA O/S filename exceeds 255 characters, and sometimes it's by a large margin. Shortening the filename at this juncture wastes considerable time, especially if there are many files involved.

Oh, and there's another PIA worth mentioning: .MSI files cannot be loaded from a directory when the directory has a leading blank (space) in its filename whereas an .EXE file can. Now how did that come about (and it's never been fixed)? [Leading spaces in directories are useful as directories and files are automatically sent to the top of the file manager tree—which is a very useful technique I've adopted for years to highlight temporary work files or sorting directories, etc. Again, this is necessary due another operating system limitation, which is that neither DOS nor Windows has any way of allowing a user to order the file/directory structure to meet his or her needs.] Other obvious limitations are that we cannot highlight filenames or directories in that we cannot make them different colors or even have filenames with different typefaces. Why not?

As I've said for years, operating system developers don't care much about user ergonomics. If they did then by now we'd even have a new file system to replace the existing one which is truly antiquated. A new file system would include metadata extension(s) within files that OSes and programs would both understand (but that's a far too big a matter to discuss here).

When one thinks about it, we users really have been shortchanged by the likes of Microsoft and others over the years.


> If Microsoft had fixed the problem back then, then there would have been pressure for other operating system developers to also fix the problem.

I don't know about all of the other reserved characters, but the colon is the path-separator character in classic Macintosh APIs and I doubt they would have ever been able to "fix" that.

> To say ≈255 characters for a filename is long enough is just not being realistic these days.

This was a limitation of old APIs and has been possible using the Unicode-aware APIs for a couple decades now, but as of Windows 10 it's possible to use long paths via the traditional APIs as well if an application declares a special manifest flag. Check out "Enable Long Paths in Windows 10, Version 1607, and Later" here: https://docs.microsoft.com/en-us/windows/win32/fileio/maximu...

Otherwise, try using those long paths as e.g. "\\?\C:\Users\Lammy\Downloads\Books.<…>.with_very_long_names_are_common_on_the_IA_+_the_Internet_Archive_filename_abcxzy123.pdf"

My number one peeve for a long time was "Documents and Settings" instead of "Users" on Windows XP, but I've come around to that once I realized it was probably intentionally-annoyingly-named to force app developers to use modern APIs since it seems to intentionally break the 8.3 length convention and force you to deal with escaping the spaces.


I was happy when I learned that "Docume~1" worked for "Documents and Settings".


"I don't know about all of the other reserved characters, but the colon is the path-separator character in classic Macintosh APIs and I doubt they would have ever been able to "fix" that."

The fundamental issues is that users should be able to type any characters including colons that appear in day-to-day use, whether it be a book title, report name, movie title or whatever without ever having to worry about it.

The fact that they cannot do so and that they deliberately have to transcribe a name to another or shorten it to accommodate an operating system's limitations wastes time and leads to errors and confusion (anyone who has ever run an IT help desk in a large organization knows this).

I'm aware of the Win 10 'long paths' fix and I've also seen the registry patch which some suggest possibly fixes earlier versions of Win 10 (I've not tried the patch). That said, the filename length limitations remains a problem for two reasons - the filename is still too short and that many programs are likely to crash if filenames were to exceed 255 chr$ (as they'd be unaware of it). To overcome this, the operating system would have to also present a shortened 255-character filename to the program in the fashion it did back in the early days for programs that only understood 8 * 3 filenames. As I see it, it's only a halfhearted too-little-too-late tweak and the job needs to be done properly.

"My number one peeve for a long time was "Documents and Settings" instead of "Users" on Windows XP..."

This is still a problem, in fact it's a real pain. Right, D&S was a pain but so too is 'Users' and 'ProgramData', both should be completely movable even to the extent of having them work from a USB stick. Whenever I set up Windows it takes me days to configure my programs so as they dump their data/save files to specific locations on other drives (this makes transferring data to Linux etc. much easier and it's much safer too if files can be kept in locations where they aren't expected to be found). In fact, this should apply to all user info including users' program files. As I see it, these limitations are just bloody-mindedness on Microsoft's part (where the user isn't an administrator, the administrator would be able to still lock user directories and files to suit the local policy).

Yes, there was an excuse for the problem 20-30 years ago but not nowadays. Unfortunately, this is a 'mindset' problem of programmers. They're so used to acronyms and shortening things that they don't realize or care that ordinary users do not understand why they just cannot type anything as they did on typewriters (and something I've not yet mentioned: why they cannot type between already-typed lines or within margins, as the old-timers continually claim they once could do with ease but can no longer do so).


Where it says FileName.Extension[,Version] MONITR.EXE,4 What was the version used for? Did it track the changes?


I don't know about DEC-20, but on VMS, changing a file makes a copy with a new version number:

  $ edit foo.txt
  ...
  Ctrl-Z
  $ dir
  foo.txt;1
  $ edit foo.txt
  ...
  Ctrl-Z
  $ dir
  foo.txt;1 foo.txt;2
When you delete a file, you have to specify a version field (blank for latest):

  $ del foo.txt;
  $ dir
  foo.txt;1
IIRC, you can configure how many versions to keep.


Yes that's right, the versions to retain was configurable, and the whole versioning mechanism was really useful and a great miss from today's OSes. The VMS file and directory syntax otoh was a real pain, Unix definitely wins there.


The version number was used the same way in TOPS-20 (which I never used) and Tenex, its predecessor (which I did). Emacs still has a facility for numbering backup files in the same way.


hm, because of CP/M, no? it's also where the drive letters came from (and some command names)


You mean CP\M? :-)


TL;DR: Because the / character was used for options. (as opposed to -, on UNIX)


At the time (in MS-DOS 2.0) you could set SWITCHAR=- in CONFIG.SYS to override this setting. @kiwidrew's post above even has an excerpt from the source that demonstrates it:

https://news.ycombinator.com/item?id=26272844

So in principle they could have thought of having something like PATHCHAR=/ as well.


Probably the second most expensive mistake after 'null'.

Discuss... :)


What 'null' mistake?


Tony Hoare apologising for inventing the null reference:

https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retra...

> This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.[26]

So much time, and therefore money, working around `\` escaping characters in file paths.


One reason might be that DOS used the "/" character for options, whereas in other operating systems the "-" character is used.


Not sure if you realise but this is not an "Ask HN" post, it's a link to an article, which of course mentions the reason:

> They couldn't use the *nix form of path separator of "/", because the "/" was being used for the switch character.


I have a feeling that the "-" character for an option is a unix (multics) only thing.

It is also worth looking at https://retrocomputing.stackexchange.com/questions/4695/slas... where there is some discussion of the history.

Along with https://retrocomputing.stackexchange.com/questions/7030/why-...

and

https://retrocomputing.stackexchange.com/questions/4686/wher...


From the third link:

> (I still think the Multics path seperator looks more natural >etc>bin - too bad Unix diverted here)

Looks more natural indeed. In fact it's visually similar to how paths are stylised in some file picker GUIs.


Yeah but imagine having to use Shift for every path separator you type.


Germans type Shift+7 to get /.

Imagine their horror when having to type AltGr+- to get a backslash. At least with Shift you can choose which of the Shift keys to use, but there is only one AltGr key (it is to the right of the spacebar, the US keyboard layout has the right Alt key there). It used to be you could use Ctrl+Alt+-, too, and at least those were available on the left hand side, too. But I do not know if Ctrl+Alt works nowadays.


> Germans type Shift+7 to get /.

Those that are too stubborn to use QWERTY, yes. Seriously, the german keyboard layout is horrible for programming and the few umlauts can easily be entered using compose keys / dead keys / alt combinations / whatever you fancy.


Keyboards may have evolved differently to compensate. The location of non-alphanumeric characters wasn't so consistent back then.


Multics inherited - as a switch character from the earlier MIT time-sharing system CTSS.


I never understood why Unix chose "-" for options. It creates unnecessary confusion. E.g. when deleting a directory called "-rf"


Another viewpoint: the problem is caused by Unix allowing any character in filenames (other than slash and null). Other platforms are more restrictive in what characters filenames are allowed to contain. If you banned the '-' character from starting a filename, the problem wouldn't happen.

I personally think Unix allowing almost any character in filenames was a mistake. You can put newlines and other control characters in filenames. That has very little legitimate use, and is a potential source of security and other bugs. There is a proposal to amend the Unix standards to disallow control characters in filenames. But it doesn't look like it is going to be successful: https://www.austingroupbugs.net/view.php?id=251


Agreed!

Shell scripting would be so much more sane (and safe) if filenames couldn't contain spaces (or control chars) and couldn't begin with a '-' character. Then the shell's default $IFS would work as intended in the presence of pathname expansion, and there would be no need to use '--' to delineate the filename arguments from the option arguments when executing commands.


A coworker of mine got into trouble when he named a directory ~ and then tried to delete it.


I once named a file $HOME and then carelessly tried to delete it without thinking twice.


> the problem is caused by Unix allowing any character in filenames (other than slash and null).

Most Unixes don't allow any character; they allow any byte other than ascii slash or zero. Turning bytes into characters is outside the scope of most Unix kernels and the filesystems therein.


> Turning bytes into characters is outside the scope of most Unix kernels and the filesystems therein.

All the major contemporary Unix(-like) kernels do have code in them to do file path charset translation. It is very important when dealing with removable media (ISO-9660, UDF), FAT filesystems, network filesystems (especially CIFS/SMB, but even some NFS implementations), filesystems defined in terms of Unicode such as NTFS, HFS+, APFS.

Traditional Unix filesystems don't do this, but they were originally designed at a time when few clearly distinguished the concept of byte from the concept of character.


The only solution to that confusion would be not allowing the options identifying character in file/directory names. Changing the character would just "change" the confusion.


Wouldn't / just cause confusion trying to delete a directory called /rf ?


There's no directory called /rf so the mistake can do no harm, as a good design should be.


Your comment is based on the premise that there is, in common use, a directory called “-rf” but not one called “/rf”? Why would the former be any less likely than the latter?


Technically deleting a directory is done with the rmdir command. But rm with flags is flexible enough that I've seen very few people use rmdir


rmdir is usefull when you want to remove all empty subdirs, without touching files or directories with files. Just do 'rmdir *' and you're done.


I like using rmdir even for individual directories that I think are empty just so that I don't accidentally delete any files that I did not expect to be there.


I use it whenever I expect a directory to be empty and want an error if for whatever reason it isn't. It's a bit niche, but good to have in your toolbelt.


There's always going to be ambiguity.

rmdir /f /usr

Is that force-deleting the /usr directory, or the /usr and /f directories?


I mean on Windows it's

   rd /s /q \path\to\something
There's zero ambiguity because both "/" and "\" are special characters.


Except that on Windows, you can use "C:/path/to/something" nearly everywhere where it accepts "C:\path\to\something".


Except C:/ is not a command option, and only Unix people would write "/"


Sounds a bit provincial. That might have been true before mainstream internet access, but the average user these days is likely more familiar with unix style paths via URLs than local filesystem paths.

Also, most non-unix operating systems that don't happen to be made by Microsoft also use the forward slashes for paths.


> the average user these days is likely more familiar with unix style paths via URLs than local filesystem paths

The average user ignores the contents of the address bar. "That's all tech gobbledegook". Increasingly, browsers even hide its contents from the user, just displaying the domain name, making the average user even less aware of it.

> Also, most non-unix operating systems that don't happen to be made by Microsoft also use the forward slashes for paths.

What are "non-unix operating systems that don't happen to be made by Microsoft". Non-Microsoft operating systems in common use – Linux (including Android), macOS/iOS/Darwin/XNU, *BSD – are Unix-like, and hence I wouldn't really call them "non-unix" (even if they are not strictly speaking certified as such)

If we look at non-Microsoft non-Unix(-like) operating systems (none of which are commonly encountered nowadays), we see a lot which use neither forward nor backslashes for directories. For example, OpenVMS and RISC OS both use dots, classic MacOS used colons. Stratus VOS uses the greater-than sign, which it inherited from Multics. The IBM mainframe operating system MVS (nowadays called z/OS) uses dots to separate the components of a dataset name – although those components aren't exactly directories. (It also supports forward slashes in its Unix compatibility subsystem, but that wasn't around for the first 25 years of its existence.)


>> What are "non-unix operating systems that don't happen to be made by Microsoft".

It's a pretty big list. To name a few:

Beos (and derivatives), AmigaOS (and derrivatives), GEOS, Commodore DOS, Temple OS, TRON, plenty of Real Time Operating Systems...


You can make that exact same argument irregardless of what switch character Unix uses. The real problem is not the switch character itself, but the approach used by standard Unix tools to "parse options until parsing fails, then assume it's an argument" which becomes an amazingly great foot gun if the actual option is inserted by a shell glob or a shell variable.


For everybody reading, a directory called "-rf" is deleted by writing "rm -- -rf"; -- indicates the end of all user-provided options.


If you're working with files, another (perhaps more reliable) way is `rm ./-rf`.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: