How to Tokenize Input Files in Java

How to Tokenize Input Files in Java thumbnail
Java uses input and output processing for many application tasks.

The Java language provides a range of choices for reading and processing input files. Using the "FileReader," "BufferedReader" and Scanner classes, you can read in the content of an external file in individual lines. Once you have each line within your program, you can then use methods of the String class to tokenize it. When you tokenize a line in your file, your program can store the tokens in arrays for further processing. Tokenizing input files in Java is common and useful.

Instructions

    • 1

      Import the Java utilities for your file input operation. The input process uses a few classes of the Java language, so add the following import statements at the top of your Java class file:

      import java.io.*;
      import java.util.Scanner;

      This gives your program the ability to find the file, open it as an input stream, read in the contents, and then process them.

    • 2

      Create instances of the input classes you need. You will need to add try and catch blocks to your program, because the input and output operations can throw exceptions, causing your program to fail; for example, if the input file you specify cannot be located. Add the following code to your program:

      try {
      FileReader fr = new FileReader("testfile.txt");
      BufferedReader br = new BufferedReader(fr);
      Scanner scan = new Scanner(br);
      //further processing here
      }
      catch(IOException ioe) { System.out.println(ioe.getMessage()); }

      This code creates instances of the classes "FileReader," "BufferedReader" and "Scanner." These objects handle opening and reading from a file. Alter the "FileReader" line to reflect the name and location of your own file. The catch block instructs Java to output a standard message if the program throws an exception.

    • 3

      Create a loop in your program to continue executing while the file has content to read. Add the following code inside your try block:

      while(scan.hasNext()) {
      //process each line here
      }
      scan.close();

      This prepares your program to process the content of the file inside a while loop. Once the while loop finishes executing, which will be as soon as there is nothing left in the file for the Scanner to scan, the while loop will exit, and then the Scanner can close.

    • 4

      Scan each line in your file. Inside your while loop, add the following code to scan a single line each time the loop executes:

      String thisLine = scan.nextLine();

      Every time the loop executes, the program will read the next line into a String variable. Once you have the line in your program, you can carry out any processing you need, including splitting it into tokens.

    • 5

      Tokenize your file lines. Add the following code, still inside the while loop, following the line reading the current line into a String variable:

      String[] lineTokens = thisLine.split(" ");

      This example splits the line on the space character, so that each array element will contain whatever is between space characters in the input file, for example single words. You can alter this line to suit your own file. Add any further processing you need for your tokens inside the while loop. Once the loop finishes executing, your tokens will no longer be accessible.

Tips & Warnings

  • Test your split code by writing out the first element in the array each time the loop executes.

  • Make sure you test your program using the type of input file data it will encounter once it is deployed. Input and output operations can be very unpredictable, so testing is essential.

Related Searches:

References

Resources

  • Photo Credit Photos.com/AbleStock.com/Getty Images

Comments

Related Ads

Featured