The manual is available online. A reminder on AWK’s built-in variables.

Basics

Print the header row (first row) in a CSV file

$ head FuelConsumption.csv | awk 'NR==1 {print $0}'

MODELYEAR,MAKE,MODEL,VEHICLECLASS,ENGINESIZE,CYLINDERS,TRANSMISSION,FUELTYPE,FUELCONSUMPTION_CITY,FUELCONSUMPTION_HWY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS

Print the data

$ head FuelConsumption.csv | awk 'NR>1 {print $0}'

2014,ACURA,ILX,COMPACT,2,4,AS5,Z,9.9,6.7,8.5,33,196
2014,ACURA,ILX,COMPACT,2.4,4,M6,Z,11.2,7.7,9.6,29,221
2014,ACURA,ILX HYBRID,COMPACT,1.5,4,AV7,Z,6,5.8,5.9,48,136
2014,ACURA,MDX 4WD,SUV - SMALL,3.5,6,AS6,Z,12.7,9.1,11.1,25,255
2014,ACURA,RDX AWD,SUV - SMALL,3.5,6,AS6,Z,12.1,8.7,10.6,27,244
2014,ACURA,RLX,MID-SIZE,3.5,6,AS6,Z,11.9,7.7,10,28,230
2014,ACURA,TL,MID-SIZE,3.5,6,AS6,Z,11.8,8.1,10.1,28,232
2014,ACURA,TL AWD,MID-SIZE,3.7,6,AS6,Z,12.8,9,11.1,25,255
2014,ACURA,TL AWD,MID-SIZE,3.7,6,M6,Z,13.4,9.5,11.6,24,267

Quick reminders,

  • NR row number.
  • NF number of fields in a row.

For regular expressions comparisons, use ~ and !~. You can also use operators such as && and ||:

$ head FuelConsumption.csv | awk -F, 'NR>1 && $12>30 {print $0}'

Data

The following data is the output of

$ kaggle competitions list > kagglcompetitions
$ cat kagglecompetitions

ref                                            deadline             category            reward  teamCount  userHasEntered  
---------------------------------------------  -------------------  ---------------  ---------  ---------  --------------  
contradictory-my-dear-watson                   2030-07-01 23:59:00  Getting Started     Prizes         96           False  
gan-getting-started                            2030-07-01 23:59:00  Getting Started     Prizes        203           False  
tpu-getting-started                            2030-06-03 23:59:00  Getting Started  Knowledge        462           False  
digit-recognizer                               2030-01-01 00:00:00  Getting Started  Knowledge       3149           False  
titanic                                        2030-01-01 00:00:00  Getting Started  Knowledge      25172           False  
house-prices-advanced-regression-techniques    2030-01-01 00:00:00  Getting Started  Knowledge       6305           False  
connectx                                       2030-01-01 00:00:00  Getting Started  Knowledge        562           False  
nlp-getting-started                            2030-01-01 00:00:00  Getting Started  Knowledge       1631           False  
competitive-data-science-predict-future-sales  2022-12-31 23:59:00  Playground           Kudos      10695           False  
jane-street-market-prediction                  2021-08-23 23:59:00  Featured          $100,000       4245           False  
hungry-geese                                   2021-07-26 23:59:00  Playground          Prizes        395           False  
bms-molecular-translation                      2021-06-02 23:59:00  Featured           $50,000        281           False  
iwildcam2021-fgvc8                             2021-05-26 23:59:00  Research         Knowledge          4           False  
herbarium-2021-fgvc8                           2021-05-26 23:59:00  Research         Knowledge         23           False  
plant-pathology-2021-fgvc8                     2021-05-26 23:59:00  Research         Knowledge         60           False  
hotel-id-2021-fgvc8                            2021-05-26 23:59:00  Research         Knowledge         20           False  
hashcode-2021-oqr-extension                    2021-05-25 23:59:00  Playground       Knowledge         90           False  
indoor-location-navigation                     2021-05-17 23:59:00  Research           $10,000        650           False  
hpa-single-cell-image-classification           2021-05-11 23:59:00  Featured           $25,000        374           False  
shopee-product-matching                        2021-05-10 23:59:00  Featured           $30,000        570           False  

Filtering

Find only the competitions that have a monetary reward, and sort them.

$ cat kagglecompetitions | awk '$5~/^\$/{print $0}' | sort -k5 -r

jane-street-market-prediction                  2021-08-23 23:59:00  Featured          $100,000       4245           False  
bms-molecular-translation                      2021-06-02 23:59:00  Featured           $50,000        281           False  
shopee-product-matching                        2021-05-10 23:59:00  Featured           $30,000        570           False  
hpa-single-cell-image-classification           2021-05-11 23:59:00  Featured           $25,000        374           False  
indoor-location-navigation                     2021-05-17 23:59:00  Research           $10,000        650           False  

Sorting

You can find more information here.

$ cat kagglecompetitions | sed  -n '3,$ p' | sort -k2n,2

bms-molecular-translation                      2021-06-02 23:59:00  Featured           $50,000        281           False  
hashcode-2021-oqr-extension                    2021-05-25 23:59:00  Playground       Knowledge         90           False  
herbarium-2021-fgvc8                           2021-05-26 23:59:00  Research         Knowledge         23           False  
hotel-id-2021-fgvc8                            2021-05-26 23:59:00  Research         Knowledge         20           False  
hpa-single-cell-image-classification           2021-05-11 23:59:00  Featured           $25,000        374           False  
hungry-geese                                   2021-07-26 23:59:00  Playground          Prizes        395           False  
indoor-location-navigation                     2021-05-17 23:59:00  Research           $10,000        650           False  
iwildcam2021-fgvc8                             2021-05-26 23:59:00  Research         Knowledge          4           False  
jane-street-market-prediction                  2021-08-23 23:59:00  Featured          $100,000       4245           False  
plant-pathology-2021-fgvc8                     2021-05-26 23:59:00  Research         Knowledge         60           False  
shopee-product-matching                        2021-05-10 23:59:00  Featured           $30,000        570           False  
competitive-data-science-predict-future-sales  2022-12-31 23:59:00  Playground           Kudos      10695           False  
connectx                                       2030-01-01 00:00:00  Getting Started  Knowledge        562           False  
contradictory-my-dear-watson                   2030-07-01 23:59:00  Getting Started     Prizes         96           False  
digit-recognizer                               2030-01-01 00:00:00  Getting Started  Knowledge       3149           False  
gan-getting-started                            2030-07-01 23:59:00  Getting Started     Prizes        203           False  
house-prices-advanced-regression-techniques    2030-01-01 00:00:00  Getting Started  Knowledge       6305           False  
nlp-getting-started                            2030-01-01 00:00:00  Getting Started  Knowledge       1631           False  
titanic                                        2030-01-01 00:00:00  Getting Started  Knowledge      25172           False  
tpu-getting-started                            2030-06-03 23:59:00  Getting Started  Knowledge        462           False  

The following command does the following

  • Read the csv.csv which contains some financial transactions in CSV format
  • print the 1st and 3rd fields as tab separated values (remember that you can use the column -t -s, command to do this).
  • truncate the 3rd column
$ awk -F',' 'BEGIN{ OFS="\t" } { print $1,$3 }' csv.csv | sed  's/\s\(...\).*/  \1***/g'

07/03/2021  PHR***
07/03/2021  Ama***
07/03/2021  PAY***
07/03/2021  PAY***
06/03/2021  PAY***
06/03/2021  PAY***
06/03/2021  PAY***
06/03/2021  PAY***
06/03/2021  PAY***
06/03/2021  PAY***
...

Summation

Sum the transactions in the 5th field

$ cat csv.csv |cut -d, -f5 | gawk '{ sum += $1 }; END { print sum }'

-1895.51