Using Ruby’s http library - download and process web pages - I
Ruby has excellent networking support. Ruby has low level networking features such as sockets and tcp/ip protocols. It also has a high level API for handling protocols such as http and ftp. In this post we will look the Ruby http library. We also look at how this library can be used to download and process web pages.
1. Downloading a web page using Ruby
Following code illustrates using net/http library for downloading the Google’s home page. You should see the google homepage html in console!
require 'net/http'
class HttpSample
def downloadGoogleHome
http_response = Net::HTTP.get_response( URI.parse('http://www.google.com/'))
puts http_response.body
end
s = HttpSample.new
s.downloadGoogleHome
end
Now in my machine, this returns text which says “document has moved”. This is because google is send a redirect to www.google.co.in. Following code shows how we can handle http redirect.
require 'net/http'
class HttpSample
def downloadGoogleHome
http_response = Net::HTTP.get_response( URI.parse('http://www.google.com/'))
if(http_response.kind_of?(Net::HTTPRedirection))
new_url = http_response['Location']
http_response = Net::HTTP.get_response(URI.parse(new_url))
end
puts http_response.body
end
s = HttpSample.new
s.downloadGoogleHome
end
Now how do we rewrite this if we need to use a proxy server to connect to internet? In Ruby it is pretty simple. Check out the new version below.
require 'net/http'
class HttpSample
def downloadGoogleHome
proxy = Net::HTTP::Proxy('ipaddress', portnumber) # use actual ip and port
url = URI.parse('http://www.google.com')
http_response = proxy.get_response(url)
puts http_response.body
end
s = HttpSample.new
s.downloadGoogleHome
end
I have been trying to write a test program to download a webpage but it wasn’t working. So, searched and found this link. With this code as well I am getting the same error and it’s not working for me. I am behind proxy. Below are the details. Plz help me out in this.
C:\myprograms>ruby -v
ruby 1.8.6 (2025-08-11 patchlevel 287) [i386-mswin32]
test2.rb :
—————————-
require ‘net/http’
class HttpSample
def downloadGoogleHome
proxy = Net::HTTP::Proxy(‘autocache.hp.com’,8080) # use actual ip and port
url = URI.parse(‘http://www.google.com’)
http_response = proxy.get_response(url)
puts http_response.body
end
s = HttpSample.new
s.downloadGoogleHome
end
———-
Error:
C:\myprograms>ruby test2.rb
C:/Ruby/lib/ruby/1.8/net/http.rb:560:in `initialize’: A connection attempt faile
d because the connected party did not properly respond after a period of time, o
r established connection failed because connected host has failed to respond. -
connect(2) (Errno::ETIMEDOUT)
from C:/Ruby/lib/ruby/1.8/net/http.rb:560:in `open’
from C:/Ruby/lib/ruby/1.8/net/http.rb:560:in `connect’
from C:/Ruby/lib/ruby/1.8/timeout.rb:53:in `timeout’
from C:/Ruby/lib/ruby/1.8/timeout.rb:93:in `timeout’
from C:/Ruby/lib/ruby/1.8/net/http.rb:560:in `connect’
from C:/Ruby/lib/ruby/1.8/net/http.rb:553:in `do_start’
from C:/Ruby/lib/ruby/1.8/net/http.rb:542:in `start’
from C:/Ruby/lib/ruby/1.8/net/http.rb:379:in `get_response’
from test2.rb:6:in `downloadGoogleHome’
from test2.rb:10
————-
I have tried using the proxy name with http:// also and that gives socketerror. I have tried using IP for the proxy and get same kinds of errors. Can you plz help me regarding this?
i tried to use ur last code i.e
# require ‘net/http’
# class HttpSample
# def downloadGoogleHome
# proxy = Net::HTTP::Proxy(‘ipaddress’, portnumber) # use actual ip and port
# url = URI.parse(‘http://www.google.com’)
# http_response = proxy.get_response(url)
# puts http_response.body
# end
# s = HttpSample.new
# s.downloadGoogleHome
# end
But it gives me an error :
SocketError
getaddrinfo: Name or service not known
Can u help me how to fix it ?
don’t forget OpenURI
require ‘open-uri’
open(“http://www.ruby-lang.org/”) {|f|
f.each_line {|line| p line}
}