Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know how to usefully create WARCs. Wget has a WARC option, but when I tried it out, it created weirdly named files that littered the www tree and looked like the names would collide and files be overwritten. Plus the IA live request should be handling getting webpages into IA.


Here, have a Gist for creating proper WARC's, ready for the Internet Archive:

https://gist.github.com/Asparagirl/6202872

And a Gist for uploading the completed WARC's to the IA using their s3-like service:

https://gist.github.com/Asparagirl/6206247

Final step is to e-mail someone at Archive Team with admin rights to move your IA upload into the proper "Archive Team" bucket, instead of "Community Texts". The awesome Mr. Jason Scott should be able to help you with that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: