Downloading YouTube videos, the programmer HOWTO
Have you ever thought about making a script or program to download the Flash videos from YouTube? I have, and I’ve done little research (or enough to know my way around) on figuring out how to successfully take a given URL where you watch the video and retrieving the actual URL to the video file using the watch URL as the base.
It’s all fairly simple, really. When you go to the page where you watch a specific video and view the output source code (mixed HTML and JavaScript) it may be confusing to figure out how YouTube locates the video. For the sake of keeping this article short and simple, I’ll get more to the point. In every page that lets you watch a particular video, there is (somewhere in the source) a definition to a JavaScript object called swfArgs.
Now, most of the information in this object is hardly any relevant to retrieving the URL to the actual Flash video. What’s important are the video_id and t attributes. Using your favorite text editor and programming language, you can write a script or program to parse out these attributes from the object quite easily. I won’t go into the details of how to do that. Once you have gotten the values for the necessary attributes, you must construct an entirely different URL using those values. The correct URL looks like this: http://www.youtube.com/get_video?video_id=<video_id>&t=<t>. The <video_id> and <t> values in the URL should be replaced with the values you parsed out of the swfArgs object.
Okay, you’re about less-than or around half-way there. That was the easy part. Now comes the hard part and this isn’t really hard if you have some good tools available to you. Now that you’ve constructed the above URL, you must use your favorite HTTP library to make a request to it. For performance reasons, I would recommend that you make a HEAD request instead of a GET request. If you make a HEAD request you can avoid the web server trying to send you a body (which can speed up things significantly on slower connections).
In every case (I can almost bet on this) YouTube will tell you to redirect to another location. Now keep in mind that YouTube is very large and has many (freakin’ tons) of videos. YouTube will keep asking you to redirect because it’s going through its Google file servers (where all the videos are kept) and looking for which server the video is stored on. If the HTTP library you are using to make the request supports automatic following of redirect responses then that’s good. If it doesn’t, then you will most definitely have to write a recursive function to do this for you. If you end up writing your own function, all you have to do is follow the URL indicated by the “Location:” header in the response the server sends back to you.
If you get that far without any errors, you’ve probably hit the jackpot. If the final response you get is an HTTP OK (status 200) response, then that final destination you have reached is the actual download URL to the Flash video. If you made HEAD requests throughout the entire time, once you reach the final destination, the video will not be downloaded because the video is included as part of the body. Instead you should make another separate request to the actual download URL and make sure it’s a GET request and that’s how you download the video.
I made a command-line PHP script that depends only on cURL called TubeScraper that takes the YouTube URL as its argument and spits back the actual download URL. The script is more suited for *nix environments but will work anywhere PHP is supported.
That’s it, I hope you learned something from this article. If you have any questions or feedback, just leave comments and I’ll address them as soon as possible ![]()
About this entry
You’re currently reading “Downloading YouTube videos, the programmer HOWTO,” an entry on Matthew Harris.
- Published:
- 12.14.07 11:44
- Category:
- PHP, Programming, YouTube
No comments
Jump to comment form | comments rss [?] | trackback uri [?]