How to Manipulate Data in Perl

Perl is designed for reading and manipulating text data. It's easy to manipulate its three basic data structures (the scalar, the list and the hash) because the Perl interpreter can type it based on contextual data. The most commonly manipulated data format in Perl is comma-separated values.

Instructions

    • 1

      Open the data file, if you are not reading data from STDIN or DATA. Use a filehandle that's easily identified:
      "open CSVFILE, "test.csv";"

    • 2

      Read the data. With small files, you can read all the lines at once. For large files, you should be read one line at a time and process it before reading the next line. Here, the angle bracket "magic syntax" for reading all lines is used to read all of the lines into the array @lines:
      "@lines = ;"

    • 3

      Strip the newlines the end of every line, using the map function. The newlines are not needed, as they are meaningless once all the lines are read in. The map function will call the block passed as the first argument for each element in the list:
      "map({chomp} @lines);"

    • 4

      Split the lines. The comma-separated values need to be split into arrays, which can be achieved by using the split and map function. The lines will be rejoined in a similar fashion when the data will be printed again. "@lines = map({[split /,/]} @lines);"

    • 5

      Manipulate the data, now that all the data in memory is in an easily accessible format.

    • 6

      Loop over them with the foreach keyword. The foreach keyword runs a block for every element in an array, assigning to each array element a reference to the default variable. This example assumes the second field (array index of 1) contains a numeric value, and adds 20 to it. Any other manipulations, including math and string manipulations, can be performed in a similar way:
      "foreach (@lines) {
      $_->[1] += 20;
      }"

    • 7

      Rejoin the data before it can be written back to a csv file. The fields need to be rejoined with commas:
      "@lines = map({join ",",@{$_}} @lines);"

    • 8

      Print the data. Again, you will loop over @lines, but this time only print each line. Here the lines are printed to STDOUT, but you could just as easily open another file and print to that:
      "foreach (@lines) {
      print $_ . "\n";
      }"

Tips & Warnings

  • Remember that the default variable $_ is not a list variable, but is instead a reference to a list variable. In order to index the array, you first have to dereference the reference with the -> operator.

Related Searches:

Comments

You May Also Like

Related Ads

Featured