Hacker News new | past | comments | ask | show | jobs | submit login
A deeper dive into our May 2019 security incident (stackoverflow.blog)
226 points by alex-warren on Jan 25, 2021 | hide | past | favorite | 58 comments



I found it interesting that the attacker looked for help on the attackee's own site. I guess it truly proves how good of a repository of information StackOverflow is.


There's a new service SO could offer: help a company under attack or recently attacked correlate the methods with suspicious users on SO, based on IP addresses and the presumption that attackers would use the same system to get help as used in the attack.


I feel like there's privacy implications around that. You'd need to be careful that it couldn't be abused.


Although the article was written in an extremely straightforward and dry technical manner, this was comedy gold.


Should probably be classified as a meta-breach. :)


Yeah, I'm pretty sure the reason why the article keeps repeating that is not a desire to provide the most detailed information about the breach...


Kudos to the team over there for being as transparent about what happened and where they were not following best practices - I am pretty sure most companies would not publicly admit this:

we had secrets sprinkled in source control, in plain text in build systems and available through settings screens in the application.


> However, there is a route on dev that can show email content to CMs and they use this to obtain the magic link used to reset credentials

So many sites do this: allowing major changes to be effective immediately (like resetting credentials/password) by simply opening a "magic link" sent by email.

I think that this "immediately" is a major security antipattern.

I prefer it when such changes have a "cooldown" period of, say, 72 hours, during which the change is "ongoing" but not effective yet and during which the user can veto the change (say by either login on the site, where they'd then get a warning that a major configuration change is ongoing, and denying the change on the site or by opening another "magic link", sent by email, which allows to deny the change).

It's not a perfect solution but it stops so many of these oh-so-common attacks dead in their tracks.

Because there's a big difference between being able to read an email meant to someone (as happened here, on the server side) and being able to prevent a legit user from receiving emails while also being able to prevent that legit user from login onto a website with its correct credentials.


I don't think it's a good idea to make a password reset take 3 entire days. In this case, I'd say the costs outweigh the benefits.


Yea, 10 minutes & a text message would suffice, IMO...


I think the most important part would be to give someone time to vet that it's legitimate. Stack Exchange has on the order of 100 developers, it wouldn't be hard to CC account creation or password reset notices to the manager of a new hire, and in that case, 10 minutes would often be enough to say "Uh, I haven't hired anyone named Curious Llama, who are they and why are they requesting developer access to an obsolete resource?" and put the brakes on.


That might not have helped in this case, if the email was sent from the same account.

The hacker had access to messages from the sender side without access to the account being reset.


I think the problem is showing the "magic link" anywhere other than in the email for the intended recipient, since it's effectively a password.

As others have mentioned though, if you as a user know someone has access to said password and you're resetting it as an emergency, that's 3 more days of some hacker being able to log in with your password!


While I partially agree, letting an user deny the password change with the old password is pretty horrible in case the password was leaked. And if your dev forgot the password after the holidays, you're looking at 3 additional free days for him.

I see your point, but a timeout is a suboptimal solution.


> ... letting an user deny the password change with the old password is pretty horrible in case the password was leaked.

That's a different threat: an attacker knowing your password. Well... How do websites that allow instant credentials resets by email typically deal with a password reset asked by someone who knows the current password? Instant change too ("enter your current password / enter your new password twice"). And the good guy is locked out of his own account. I don't see how it's worse than that.


The next time you forget a password and need to reset it, how likely are you to be willing to wait three days?


> The next time you forget a password and need to reset it, how likely are you to be willing to wait three days?

10 minutes like others suggested is way too short that said: this wouldn't catch attacks happening at night.

But to answer your question: it really depends what it is that you are protecting. For most sites I use by very far I don't see how 72 hours without access would be that problematic. Not logged in to StackOverflow for 3 days? Not a problem. Not logged in to HN for 3 days? Not a problem. Not logged in to Twitter for 3 days? I can live with that. Etc.

The question is: how much convenience are you willing to trade for security?


Ten days is better than forever, FWIW, which is something that many websites do.


That was an interesting read. I'm left wondering "why" though. Anyone care to take a wild guess what they were after? That seems like quite a bit of work to be just doing it for no particular reason.


Given the focus on enterprise systems and teams, really looks like it was a Solarwinds type (but lower sophistication) attack where SO wasn't really the target. The targets were users of SO Enterprise or teams products.


If that's the case, why would they elevate privileges on the main SO site and draw attention to their successful intrusion?


My guess: they were trying to create a public proof that they had access to Stack internals (probably so they could sell it), and weren’t familiar enough with StackExchange administrivia to realize that the users would immediately notice and be suspicious of a new non-elected moderator.


Because they are working blind.

They are trying to find something that they don't necessarily know exists. They also don't know what tripwires exist and after trawling around for so long, they might even have assumed that SE didn't have any monitoring systems.


Of the two responses here, yours strikes me as the more plausible, non-cartoonish one. I think it's good to come to these things with an understanding of how behaviors can be happenstance and come from an attacker negotiating with limited information or their own limited understanding.


I’m doing my best not to be insulted by your description of my conjecture as “cartoonish.” Maybe one of us has misunderstood the other, but it seems to me that I’ve also proposed a mechanism that involves the attacker having limited understanding.

From the blog post, it sounds like the bit where they were working in the dark was getting things to run on a production database, but the SQL command to give them moderator privileges was prepared ahead of time (which makes sense, they had a local site set up and could prepare and test that part at their leisure). It seems very unlikely to me that they would have spent so much effort on getting that particular SQL to run for no reason, so the moderator privilege in particular probably had some appeal to them from a technical perspective. The main permission I can think of that you get as a mod is the ability to edit posts. It seems unlikely that an attack this sophisticated would be just for vandalism, so I consider the profit motive instead. “For sale: Access to Stack Exchange internal infrastructure. As proof, ask me to edit any post on Stack Overflow.”

What tripped them up was not a technical tripwire, but rather how intimate the SE community is with their moderators. This wouldn’t be obvious from the codebase; only if you’d spent some time on the meta sites would you be aware of the culture surrounding SE mods. On, say, Facebook, getting a global moderator bit isn’t something that a big chunk of the user base would have name recognition for. My stereotype is that someone who’s trying to break in to infrastructure (and not doing responsible disclosure) probably isn’t also volunteering in the review queues, and so wouldn’t realize what a risk this would be.

Of course this is all conjecture and we’ll never know for certain without tracking down the attacker and asking them, but I’d like to think I’ve constructed a reasonable scenario that takes all of the known facts and a probable motive and comes up with a plausible explanation for what was (in hindsight, at least) a fairly significant blunder.


Exactly what I thought. Probably more beneficial for attacker would be to report the security vulnerabilities and receive a bounty in turn.


> A significant period of time is spent investigating TeamCity—the attacker is clearly not overly familiar with the product so they spend time looking up Q&A on Stack Overflow on how to use and configure it. This act of looking up things (visiting questions) across the Stack Exchange Network becomes a frequent occurrence and allows us to anticipate and understand the attacker’s methodology over the coming days.

Awesome writeup - this gave me a good laugh :-)


"However, there is a route on dev that can show email content to CMs and they use this to obtain the magic link used to reset credentials."

Zawinski's Law: "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can."


Gonna clarify here, because that description is a bit misleading: this wasn't a route that allowed viewing sent emails, it was a route that allowed viewing what would be sent if a password reset was requested.

The story behind that route might be interesting... See, originally Stack Overflow didn't have passwords - all logins were done via OpenID, so any credential management you'd need to do was done through your provider (Google, LiveJournal, myOpenID, etc) This made account recovery assistance pretty simple: given a verified email address, the system would just send that address an email that reminded the owner of any and all OpenID providers that they'd associated with their account. From there, it was up to the account owner to work with a provider to do things like reset passwords.

Skip forward a few years, and Stack Overflow had its own OpenID provider - now you could sign up with an email and password just like a normal site, except really you were creating an account on https://openid.stackexchange.com/ - so the recovery process remained pretty much the same, just with a new provider that happened to be run by the same company.

So far so good... Except, this was awkward to explain to folks. Really, that was what ended up killing OpenID: folks wanted a "Google" or "Facebook" button, not a whitepaper on fancy new authentication systems.

At this point Stack Overflow decided to try to streamline the login process, making signing up and logging in with their own provider seamless: no need to know anything about OpenID. Now recovery emails started including password reset links, and also reduced or removed information on other OpenID providers that were associated with the account in an effort to reduce confusion. The decision tree for generating those emails got complex.

And the decision tree for supporting users got complex as well. Support staff got frustrated; they'd been used to knowing what would and wouldn't be in a recovery email, and had a pile of templates ready to help folks navigate login issues based on that. But now they were getting replies back from folks who were confused and upset because their recovery email didn't contain information that the support person had asserted it would!

This was the genesis of the vulnerable route: a way for support staff to ensure that they were providing accurate information to users about how they could recover their accounts. By the time of this attack, it was already obsolete; the login system had been redesigned twice since the confusing and complex system that first required it. It was vestigial and forgotten... The ideal breeding ground for vulnerabilities.

(source: I worked at Stack Overflow through the time period described in this post, and was involved in support during the period when the relevant route was useful)


Yeah. I'm almost certain I was the one who got sick of having no idea what a user would see when they opened their email that I asked for _some way_ of see it. (Otherwise it was this strange dance of "Request a password recovery and tell me what it says.") I don't recall if I ever considered that it might be a _massive security hole_ if anyone got a hold of it. In retrospect . . .

(I overlapped with Shog at Stack Overflow.)


Only became a problem in combination with other missteps (or constraints, like dev needing to be routable).

Which is what makes this kind of stuff so insidious.


Excellent writeup and it shows that Stack Overflow's account management came a long way.

Still there is room for improvement. What confused me a lot recently was, that the reset link sent to a certain email is not necessarily for the login associated with that email.

I tried to to login at Stack Overflow after a long time. Entered my current mail and pw. Did not work, clicked pw recovery, received mail, reset pw, got logged in. So far so good.

Logged out, couldn't log in again. After a few password resets I realized that, while the mail was sent to my current address, the reset link actually was for the pw of a login associated with an old email.

At least for me that was not clear from the recovery email. Here is the full text with only email redacted:

> Account Recovery - Stack Overflow

> We received an account recovery request on Stack Overflow for new@example.com.

> If you initiated this request, reset your password here.

> You can use any of the following credentials to log in to your account:

> Email and Password (old@example.com)

> Email and Password (new@example.com)

> Once logged in, you can review existing credentials and add new ones. Simply visit your profile, click on Edit Profile & Settings and My Logins.

To be clear, "reset your password here." is a link and it changes the pw only for old@example.com.


Yeah, this doesn't surprise me. The login system at SO is unnecessarily complex - by which I mean that the complexity of the interface does not match the complexity of the underlying system, because while both have been redesigned several times they've never been completely redesigned together.

So you end up with weird situations like this, where both the interface and the underlying system support multiple associated email addresses, but not in the same ways or with the same functionality exposed.

It is... A legacy system, with all that that entails.


> Our dev tier was configured to allow impersonation of all users for testing purposes, and the attacker eventually finds a URL that allows them to elevate their privilege level to that of a Community Manager (CM)

> After attempting to access some URLs, to which this level of access does not allow, they use account recovery to attempt to recover access to a developer’s account (a higher privilege level again) but are unable to intercept the email that was sent. However, there is a route on dev that can show email content to CMs and they use this to obtain the magic link used to reset credentials.

Many of these debugging tools are great for devs to test things quickly but I've always felt very weary of having these exist in an app without some strict access control with 2FA. Ideally you'd not have them in the app at all, maybe just on local dev.


As someone who doesn't work much with software teams, can someone fill in my gaps for understanding timeline.

I'm imagining after a security issue is identified, the steps taken are roughly in the below order and close-ish for the date. I guess my question is, why does it take 20 months from start to blog post?

-Contain the issue (1wk) -Remove the threat (1wk) -Build up remedies (a few months) -Check and recheck what happened to make sure you're accurate when submitting final reports (a few months) -Release a blog post (1month)

The timeline is a cool day by day instance, but I just don't understand the larger timeline.


I assume it's related directly to "It’s been quite some time since our last update but, after consultation with law enforcement, we’re now in a position to give more detail".


It's this.

Discovery, immediate mitigation, deeper mitigation, general notice, notifying effected users - all these can happen pretty quickly once the ball is rolling. Once you're dealing with "the law" in any capacity you are constrained in what you details you can share broadly, and when.

I'm happy we were finally able to share this level of detail.


Did they figure out who were behind these attacks? These seems to be quite sophisticated and quite long taking attacks to dig so deeply into SO system.


The chronology has some issues in dating/day starting on "Tuesday May 15th" (Tuesday was the 14th) and continuing on.


Ouch, good catch, fixing now


Something to do with the difference between UTC and New York time zones, perhaps?


Thank you SO for being open and listing the best practices. It seems like even few security best practices makes it harder for hackers to get in to your system.

I have database connecting strings and password as ENV variables. But I still don't know what is the best practice. Lets say someone gets access to the server, they can still read the ENV vars, right? It definitely prevents from accidently checking in your code git repo. But still . Does anyone has good recommendation for storing credentials like database passwords in a way secured way.


> Lets say someone gets access to the server, they can still read the ENV vars, right?

Correct. Easiest way is to look at `/proc/$pid/environ`. It contains the \0 separated values for that process.


I don't think there's a magic way to do this, if your app can connect to the database and someone has access to your app server - they have access to your database as well.


Interesting that most of the mitigations are "move resource behind firewall." Kind of an indictment of the whole BeyondCorp idea - unless we really trust our 2FA to never have any access bypass issues like the initial access to the dev environment here. Speaking of that, I didn't see "fix bug allowing unauthenticated access to dev environment" listed as one of the mitigations, but maybe I glossed over it.


You got it backwards. BeyondCorp-like systems would have prevented that.

(FYI: I'm very familiar with BeyondCorp, as I was on an adjacent team when it was invented. Now I am an SRE at Stack Overflow when the incident happened.)


It's in the remediations section, but maybe the wording isn't clear:

*> Hardening code paths that allow access into our dev tier. We cannot take our dev tier off of the internet because we have to be able to test integrations with third-party systems that send inbound webhooks, etc. Instead, we made sure that access can only be gained with access keys obtained by employees and that features such as impersonation only allow de-escalation—i.e. it only allows lower or equal privilege users to the currently authenticated user. We also removed functionality that allowed viewing emails, in particular account recovery emails.*

There was no "unauthenticated" access into dev - the access key here is what allows login at all to our dev environment, but the attacker was able to bypass that protection.


Thanks, yeah I missed that on account of misunderstanding the nature of the access (bug vs token shenanigans)


> Hardening code paths that allow access into our dev tier. We cannot take our dev tier off of the internet because we have to be able to test integrations with third-party systems that send inbound webhooks, etc. Instead, we made sure that access can only be gained with access keys obtained by employees and that features such as impersonation only allow de-escalation—i.e. it only allows lower or equal privilege users to the currently authenticated user. We also removed functionality that allowed viewing emails, in particular account recovery emails.


I wouldn't say that, but I would say that 2FA is insufficient. MFA, where the "M" includes some non-zero level of trust in the devices used for user and application runtime becomes essential.

"2FA", as commonly implemented in many scenarios is weak and only helps address certain scenarios -- TOTP tokens, for example, are pretty trivial to compromise. Critical infrastructure needs hardened tokens and clients with more controls.


I agree. I’ve employed the BeyondCorp philosophy behind a VPN as an extra measure of security, which is to say that all services are authenticated and encrypted inside the VPN perimeter. As shown in this article, service accounts are a major concern for attacker lateral movement which can’t be effectively protected with just 2FA.


The report describes a security breech in 2019; the report was held back until now for legal reasons:

> Sunday May 5th

> ...a login request is crafted to our dev tier that is able to bypass the access controls limiting login to those users with an access key. The attacker is able to successfully log in to the development tier.

> Our dev tier was configured to allow impersonation of all users for testing purposes, and the attacker eventually finds a URL that allows them to elevate their privilege level to that of a Community Manager (CM). This level of access is a superset of the access available to site moderators.

EDIT: clarified that the report was held back


The breach itself was announced shortly after it was discovered: https://stackoverflow.blog/2019/05/16/security-update/

And affected users were notified once identified, which was shortly after the announcement: https://stackoverflow.blog/2019/05/17/update-to-security-inc...

This is an update with more details, which was held back for legal reasons.


Yes, thank you. My wording was ambiguous; my bad.


tldr; 1. Attacker found a stackoverflow dev environment requiring a login/password and access key to get in.

2. Attacker was able to login to the dev environment with their credentials from prod (stackoverflow.com) by a replay attack based on logging in to prod.

3. The dev environments allows viewing outgoing emails, including password reset magic links. The attacker triggered a reset password on a dev account, and changed the credentials. This gives them access to "site settings."

4. Settings listed TeamCity credentials. The attacker logged into TeamCity.

5. Attacker spends a day or so getting up to speed with TeamCity, in part by reading StackOverflow questions.

6. Attacker browses the build server file system, which includes a plaintext SSH key for GitHub.

7. Attacker clones all the repos 8. Attacker alters build system to execute an SQL migration that escalates him to a super-moderator on production (Saturday May 11th).

9. Community members make security report on Sunday May 12th, stackoverflow response found the TeamCity account was compromised and moved it offline.

10. Stackoverflow determines the full extent of the attack over the next few days.


> Fortunately, we have a database containing a log of all traffic to our publicly accessible properties

https://stackoverflow.com/legal/privacy-policy

GDPR anyone?


> When you visit the Network or use our Apps, Stack Overflow automatically receives and records information from your browser or mobile device, such as your Internet Protocol (IP) address or unique device identifier. Cookies and data about which pages you visit on our Network allow us to operate and optimize the Products and Services we provide to you. This information is stored in secure logs and is collected automatically.

Clear as day that they're doing exactly that. You agree to this when you use the site.


Collecting data might well be acceptable with the GDPR.

However what makes it legal isn't if it is written in the TOS or in a cookie banner.

AFAIK what matters is either:

- if you have a specific, valid (according to the GDPR) reason,

- or if you have the users free and informed consent.

... and yes, I think a number of the things I still see on the web is not OK:

- dark pattern where if you click manage settings everything is opted out, but there's a big green "Accept everything" and a small bland "Confirm my choices"? Doesn't fly because the rule that it should be equally easy to opt out.

- Cookie banners with no real opt out? No way.

- Cookie banners where you have to deselect 927 "partners"? Also no way.

The only ones that seems legal are those who either uses a pure minimum of cookies for preserving state and those allow one to opt out directly but inform you that ads might become less relevant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: