|  | 
| 
 | 
| Technobabble Post your general Need for Help questions here. • Lossy or Lossless? Moderators | 
|  | 
|  | Thread Tools | 
| 
			 
			#1  
			
			
			
			
			
		 | |||
| 
 | |||
| 
				
				Any good Python coders here? Especially with regard to web scraping?
			 
			
			I'm  trying to scrape the torrent details from TTD to put into a database which I would make available online, in the event that TTD closes down. Please see this thread for more details: http://www.thetradersden.org/forums/...d.php?t=202461 I've not used the requests package before, and I'm having problems. I'm starting from this page: http://www.thetradersden.org/forums/....php/f-11.html, then going into the various list pages for some of the categories (eg Audio, Audio Inactive, Audio Pulled). The next level pages list torrents, 250 per page, each with a link to the torrent detail thread eg http://www.thetradersden.org/forums/....php/f-12.html is page 1 of 147 listing Active Audio torrents. Sometimes my scraping code retrieves an Index page, but other times I get a Status of 200 from the request, but an empty response. Trying to retrieve a torrent thread, eg http://www.thetradersden.org/forums/.../t-203252.html, always give a 200 status and an empty response. If the request fails, I should get a non-200 status code, but I don't. Could it be authentication? Caching? Is the TTD backend blocking scraping of torrent detail threads? TIA I realise that web scraping these pages may not be the best way to get the info; a better way would be a dump/extract of the backend database(s). If the scraping does work, I would be mindful of NOT scraping 100,000+ pages quickly. The following members like this post:  PanTau, Mr. Clumpy | 
| 
			 
			#2  
			
			
			
			
			
		 | |||
| 
 | |||
| 
				
				Re: Any good Python coders here? Especially with regard to web scraping?
			 
			
			So what's happening is your scraper is trying to grab the page data before it's loaded because there's a js file that loaded before the content.  For example: http://www.thetradersden.org/forums/.../t-203252.html If you pull up your dev tools in your browser->console-> this is the error: Quote: 
 Quote: 
 Quote: 
 https://selenium-python.readthedocs.io/waits.html No members have liked this post. | 
| 
			 
			#3  
			
			
			
			
			
		 | |||
| 
 | |||
| 
				
				Re: Any good Python coders here? Especially with regard to web scraping?
			 
			
			Thanks so much for your reply.   I will study what you've suggested, and see what happens. Would you be willing to engage in a direct conversation via email? all the best, Mike No members have liked this post. | 
| 
			 
			#4  
			
			
			
			
			
		 | |||
| 
 | |||
| 
				
				Re: Any good Python coders here? Especially with regard to web scraping?
			 
			
			Yes, actually after I closed the browser (i'm at work atm) I realized that I could probably just scrape it all tonight for you. I will DM you my email. The following members like this post:  Mr. Clumpy | 
|  | 
| The Traders' Den | 
| Tags | 
| archive, python, ttd | 
|  Similar Threads | ||||
| Thread | Forum | Replies | Last Post | |
| Reseed of Monty Python's Hastily Cobbled Together - krokodyle | Seeding Talk - ISO Requests | 0 | 2009-03-13 05:31 PM | |
| Thread Tools | |
| 
 | 
 |