Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Has Facebook 'Download your information' become very slow?
17 points by nischalsamji on Dec 13, 2018 | hide | past | favorite | 6 comments
I requested a copy of my facebook data to be downloaded on December 2nd. The status page still says my download is pending. Did anyone else face this issue?

I downloaded my information previously in a couple of instances and it was very fast then.




I'd love to understand what is going on mechanically when these requests are made. Like what's actually being done that takes my google data download request days to fulfill?

My uninformed guess is that it's a service which orchestrates internal API calls to all the other services and builds a tarball. And the reason it takes forever is probably mostly just low priority queuing of all these various requests.


I can provide insight here...

A typical archive might touch 50+ services. Each of those services has an API to export data which is called. If any service is down, the whole thing is delayed.

Internally, each service has to go retrieve all the data. All the data. That's typically a very expensive operation - A datastore for a document editor would perhaps be designed for an average user to store 100k documents, but perhaps only access 10 per day. There's a good chance the data is sharded per user, which means the work of retrieving all the data is going to fall on just one machine/storage server/application server/rendering server/whatever. That server still has other users to service too, so we can't hammer it flat out with your request.

Many types of data, when old, get archived on hard disk, since the chance of a user accessing an email attachment from 2009 is very very low. When creating a mail archive however, all those old mail attachments need accessing, and remember there's a good chance they're sharded by user, and therefore all on a small set of disks.

Remember most of the applications were designed before data exporting was a thing, so typically there is no API to read all data, and instead it must be implemented as a 'list all objects then retrieve objects one/a few at a time'.

If a disk seek on a 7200 rpm disk takes 10 milliseconds, and you have 1 million mail attachments to retrieve in random order, thats 3 hours, assuming no other load on that disk cluster.


Thanks for sharing. A lot of concepts for me to parse and Google. I'm having a fun evening of it.


I've been doing this regularly for years, and it's always taken a few days - so I wouldn't describe it as "very fast", but I've never seen it take more than a week.


Yea... The last time I did it, it took me 4 hours. Now it's been pending for 12 days as of today.


I submitted a Facebook request on Nov. 18 and had the file on Nov. 19. I submitted a WhatsApp request 8 days ago and still haven't received anything. Interestingly, their service promises it within two (or maybe three?) business days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: