I will be using ruby's fog gem and mechanize to launch a new proxy, connect through it and scrape simple content.
First go back and discover the ami id of the proxy ami you created. It's in the AMIs section of your ec2 dashboard and the id will look like: ami-xxxxxxxx
I like to keep things like this in my ENV so they don't end up in a script floating around on the internet. In windows, add PROXY_AMI to your environment variables, or for linux put
export PROXY_AMI=ami-xxxxxxxx
in your .bash_profileThe script:
require 'mechanize' require 'fog' compute = Fog::Compute.new :provider => 'AWS', :aws_access_key_id => ENV['AMAZON_ACCESS_KEY_ID'], :aws_secret_access_key => ENV['AMAZON_SECRET_ACCESS_KEY'] proxy = compute.servers.create :image_id => ENV['PROXY_AMI'] proxy.wait_for { ready? }
At this point we have spun up the proxy instance and waited for it to be in a ready state. Note that this doesn't yet mean that it's able to proxy our requests. This is because there is a delay between bootup and starting the proxy process.
agent = Mechanize.new agent.set_proxy proxy.public_ip_address, 8080
Now we have instantated the Mechanize address and set the proxy to our new instance. The proxy does not need to be ready for us to
set_proxy
to it.
until page = agent.get('http://ww.google.com/') rescue nil sleep 1 puts 'waiting for proxy' end
There's some debate about what's the best way to check if a proxy is working, but I think it's best to just try to connect until it works.
puts page.title proxy.destroy
Don't forget that last destroy line unless you want a big surprise on your next AWS bill.
One last gotcha, make sure that your default security group allows you to connect on 8080 (or whatever port you choose). I like to allow all TCP traffic from my development machine's IP address.