How to Parse an HTML File With Ruby
Ruby is a Web scripting language similar in philosophy to PHP or Python. Ruby was developed in order to create easy to deploy scripts with human-readable code, easy to implement and debug. Ruby also contains an easy library installer called "gem" which allows you to install modules to perform various tasks. For example, by installing the "Nokogiri" module through gem, you can easily parse through HTML files with a few simple function calls.
Instructions
-
-
1
Download Nokogiri, an HTML parser for Ruby, using the gem installer. Issue the following command in a command window:
sudo gem install nokogiri
-
2
In your Ruby code, require the nokogiri module with the "require" keyword:
#1/usr/bin/ruby
require 'nokogiri'
-
-
3
Parse a sample HTML document with a Nokogiri object: The object will contain all the content and HTML in the object:
require 'nokogiri'
doc = Nokogiri::HTML(<<-eohtml)
<html>
<body>
<h1>Hello world</h1>
</div>
</body>
</html>
eohtml
-
1