SQLCell 2.0: Redesigning SQLCell for JupyterLab

14 minute read | Updated:

I've been meaning to revisit <a href="https://github.com/tmthyjames/SQLCell">SQLCell</a> for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would. 
Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:
    1) No UI buttons or graphs (yet)
    2) The API is a lot cleaner
    
Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, <a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/JavaScript%20Notebook%20Extensions.html">that is no longer (publicly) possible</a>. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (<a href="https://github.com/jupyterlab/jupyterlab/issues/3118">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/issues/3748">here</a> and <a href="https://github.com/jupyterlab/jupyterlab/pull/2595">here</a>). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.
Here are the features I'm starting with:
>### [Engine Persistence](#Engine-Persistence)
>### [Python Variables as Query Parameters](#Python-Variables-as-Query-Parameters)
>### [Engine Aliasing](#Engine-Aliasing)
>### [Defining and Persisting Hooks](#Defining-and-Persisting-Hooks)
>### [Write SQL Output to Session Variable](#Write-SQL-Output-to-Session-Variable)
>### [Background Queries](#Background-Queries)
To install SQLCell use `pip`:

I've been meaning to revisit SQLCell for some time. Besides the codebase being a product of my early days of learning how to program and that making contributions was cumbersome due to the poor design of the API, the real force behind me refactoring was that I've been using Jupyter Lab a lot lately and I still hate most SQL interfaces. I like the freedom of being able to use Jupyter as a pseudo-SQL app—there's not as much functionality as a fully fledged SQL interface, but for most of my needs, I just need to write queries; I don't need the same level of utility as a DBA would.

Therefore, I'm refactoring SQLCell completely to work with both Jupyter Notebook and Jupyter Lab. The major differences in this release are:

1) No UI buttons or graphs (yet)
2) The API is a lot cleaner

Back in the day of Jupyter Notebooks, you could publish a notebook that injected malicious Javascript onto a user's computer. Now, that is no longer (publicly) possible. I say public, because there's not a public Javascript API for Lab like there is for Notebook. This comes with its benefits and costs, as is reflected in many Github conversations (here and here and here). With that, I am still looking into building graphs and UI features into a future version. And hopefully this release will attract more contributors as the API is much cleaner than version 1's.

Here are the features I'm starting with:

Engine Persistence

Python Variables as Query Parameters

Engine Aliasing

Defining and Persisting Hooks

Write SQL Output to Session Variable

Background Queries

To install SQLCell use pip:

pip install sqlcell
Then load it in your Notebook/Lab by using the `%load_ext` command:

Then load it in your Notebook/Lab by using the %load_ext command:

In [1]:
%load_ext sqlcell.sqlcell
## Engine Persistence

Engine Persistence

The first time you use `%%sql`, you'll need to pass your connection string as an `--engine` argument.

The first time you use %%sql, you'll need to pass your connection string as an --engine argument.

In [19]:
%%sql --engine=mssql+pymssql://SomeUserName:SomePassword@SomeHostName/tdobbins
SELECT TOP 2 N1.*
FROM NashvilleHousingMSSQL N1
JOIN NashvilleHousingMSSQL N2 ON N2.ParcelID = N1.ParcelID
ORDER BY Address DESC
Out[19]:
Acreage Address Bedrooms BuildingValue City ExteriorWall FinishedArea FoundationType FullBath Grade ... SalePrice SoldAsVacant State Suite/Condo# TaxDistrict TotalValue Unnamed:0 Unnamed:0.1 YearBuilt image
0 0.11 804 LENA ST None 0.0 NASHVILLE None None None None None ... 16000 No TN None URBAN SERVICES DISTRICT 9000.0 87 87 None None
1 0.11 804 LENA ST None 0.0 NASHVILLE None None None None None ... 16000 No TN None URBAN SERVICES DISTRICT 9000.0 87 87 None None

2 rows × 31 columns

After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and `%%sql` will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.

After that, SQLCell will create a few aliases for you to easily refer to the engine you're wanting to run against. Now you can pass the entire connection string or either the hostname or the database name, and %%sql will use the same engine for all three. Note, if you have two databases with the same name, consider just using the hostname.

In [3]:
%%sql --engine=SomeHostName
        
SELECT TOP 2 * 
FROM NashvilleHousingMSSQL
Out[3]:
Acreage Address Bedrooms BuildingValue City ExteriorWall FinishedArea FoundationType FullBath Grade ... SalePrice SoldAsVacant State Suite/Condo# TaxDistrict TotalValue Unnamed:0 Unnamed:0.1 YearBuilt image
0 0.76 5316 OVERTON RD 4.0 300400.0 NASHVILLE BRICK 2370.0 FULL BSMT 2.0 B ... 267300 No TN None URBAN SERVICES DISTRICT 355400.0 24 24 1968.0 \184000\836001.JPG
1 NaN None NaN NaN None None NaN None NaN None ... 207000 No None None None NaN 59 59 NaN None

2 rows × 31 columns

`%%sql` will also remember the last used engine so you don't have to refer to it everytime you run a query.

%%sql will also remember the last used engine so you don't have to refer to it everytime you run a query.

In [15]:
%%sql
        
SELECT TOP 2 * 
FROM NashvilleHousingMSSQL
Out[15]:
Acreage Address Bedrooms BuildingValue City ExteriorWall FinishedArea FoundationType FullBath Grade ... SalePrice SoldAsVacant State Suite/Condo# TaxDistrict TotalValue Unnamed:0 Unnamed:0.1 YearBuilt image
0 0.76 5316 OVERTON RD 4.0 300400.0 NASHVILLE BRICK 2370.0 FULL BSMT 2.0 B ... 267300 No TN None URBAN SERVICES DISTRICT 355400.0 24 24 1968.0 \184000\836001.JPG
1 NaN None NaN NaN None None NaN None NaN None ... 207000 No None None None NaN 59 59 NaN None

2 rows × 31 columns

## Python Variables as Query Parameters

Python Variables as Query Parameters

If you want to pass Python variables to your query, you can use the colon (`foo = :foo`) syntax (similar to SQLAlchemy's usage).

If you want to pass Python variables to your query, you can use the colon (foo = :foo) syntax (similar to SQLAlchemy's usage).

In [14]:
foo = 'NASHVILLE'
bar = 132000
baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')
In [5]:
%%sql --engine=SomeHostName
        
select * 
from NashvilleHousingMSSQL
where PropertyCity = :foo
    and SalePrice like :bar
    and LandUse in :baz
Out[5]:
Acreage Address Bedrooms BuildingValue City ExteriorWall FinishedArea FoundationType FullBath Grade ... SalePrice SoldAsVacant State Suite/Condo# TaxDistrict TotalValue Unnamed:0 Unnamed:0.1 YearBuilt image
0 None None None None None None None None None None ... 132000 No None 8 None None 0.0 0.0 None None
1 None None None None None None None None None None ... 132000 No None 8 None None NaN NaN None None

2 rows × 31 columns

And the same with a Postgres dialect:

And the same with a Postgres dialect:

In [7]:
foo = 'NASHVILLE'
bar = 132000
baz = ('SINGLE FAMILY', 'RESIDENTIAL CONDO')
In [8]:
%%sql --engine=PG
select n2.* 
from "NashvilleHousingPostgres" n1
JOIN "NashvilleHousingPostgres" n2 on n2."ParcelID" = n1."ParcelID"
where n2."PropertyCity" = :foo
    and n2."SalePrice" = :bar
    and n2."LandUse" in :baz
Out[8]:
Acreage Address Bedrooms BuildingValue City ExteriorWall FinishedArea FoundationType FullBath Grade ... PropertyCity SaleDate SalePrice SoldAsVacant State Suite/Condo# TaxDistrict TotalValue YearBuilt image
0 None None None None None None None None None None ... NASHVILLE 2013-01-24 132000 No None 8 None None None None

1 rows × 29 columns

## Engine Aliasing

Engine Aliasing

Similar to <a href="https://github.com/tmthyjames/SQLCell#declare-engines">version 1</a>, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:

Similar to version 1, you can also alias engine strings. Unlike version 1, there are no UI buttons (yet). Using the following syntax:

In [5]:
%%sql --engines
PG=postgresql://username:@localhost/db
MSS=mssql+pymssql://SomeUserName:SomePassword@180.36.3.76/tdobbins
LC=mssql+pymssql://username:password@180.36.2.90/DB1
Out[5]:
'Engines successfully registered'
we can now use `%%sql` with an alias as our `--engine` parameter:

we can now use %%sql with an alias as our --engine parameter:

In [6]:
%%sql --engine=MSS
select top 2 * from NashvilleHousingMSSQL
Out[6]:
Acreage Address Bedrooms BuildingValue City ExteriorWall FinishedArea FoundationType FullBath Grade ... SalePrice SoldAsVacant State Suite/Condo# TaxDistrict TotalValue Unnamed:0 Unnamed:0.1 YearBuilt image
0 0.76 5316 OVERTON RD 4.0 300400.0 NASHVILLE BRICK 2370.0 FULL BSMT 2.0 B ... 267300 No TN None URBAN SERVICES DISTRICT 355400.0 24 24 1968.0 \184000\836001.JPG
1 NaN None NaN NaN None None NaN None NaN None ... 207000 No None None None NaN 59 59 NaN None

2 rows × 31 columns

To view all your engines, use the `list` argument:

To view all your engines, use the list argument:

In [7]:
%%sql --engines
list
Out[7]:
Alias Engine
0 PG postgresql://username:@localhost/db
1 MSS mssql+pymssql://SomeUserName:SomePassword@17...
2 LC mssql+pymssql://username:password@180.36.2.9...
To clear your engines, use the `--refresh` command like so:

To clear your engines, use the --refresh command like so:

In [4]:
%%sql --refresh
engines
Out[4]:
'Removed all records from engines'
## Defining and Persisting Hooks

Defining and Persisting Hooks

One thing I tried to do with version 1 is mimic `psql` commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:

One thing I tried to do with version 1 is mimic psql commands (since I was using postgres exclusively), but doing that forced limitations on users who used other DBs. With this version, I'm removing those limitations by allowing users to register their own often-run shortcut queries like so:

In [3]:
%%sql --hook
tables MSS = USE {0};
SELECT * FROM sys.Tables;
columns_mss MSS = EXEC sp_columns {0};
columns_db1 DB1 = EXEC sp_columns {0};
views mssql+pymssql://username:password@hostname/DB = USE {0};
SELECT * FROM sys.Views;
tables_db1 DB1 = USE {0};
SELECT * FROM sys.Tables;
views_l mssql+pymssql://:@localhost/DB = USE {0};
SELECT * FROM sys.Views;
databases LC = SELECT name FROM master.sys.databases;
Out[3]:
'Hook successfully registered'
The syntax for defining hooks is:
> ```
<name> <engine|alias> = <sql command>```
Take this example:
>```
tables MSS = USE {0};
SELECT * FROM sys.Tables;
```
where 
`tables` is our command shortcut.
`MSS` is an engine alias we have already defined with `--engines`.
and 
    
```
USE {0};
SELECT * FROM sys.Tables;
```
is the comand we want to run when we issue our shortcut.
    
To use your newly-defined shortcuts, prepend `~` to your shortcut name:

The syntax for defining hooks is:

<name> <engine|alias> = <sql command>

Take this example:

tables MSS = USE {0};
SELECT * FROM sys.Tables;

where

tables is our command shortcut.

MSS is an engine alias we have already defined with --engines.

and

USE {0};
SELECT * FROM sys.Tables;

is the comand we want to run when we issue our shortcut.

To use your newly-defined shortcuts, prepend ~ to your shortcut name:

In [8]:
%%sql
~columns_mss NashvilleHousingMSSQL
Out[8]:
CHAR_OCTET_LENGTH COLUMN_DEF COLUMN_NAME DATA_TYPE IS_NULLABLE LENGTH NULLABLE ORDINAL_POSITION PRECISION RADIX REMARKS SCALE SQL_DATA_TYPE SQL_DATETIME_SUB SS_DATA_TYPE TABLE_NAME TABLE_OWNER TABLE_QUALIFIER TYPE_NAME
0 NaN None Unnamed:0 -5 YES 8 1 1 19 10.0 None 0.0 -5 None 108 NashvilleHousingMSSQL dbo tdobbins bigint
1 NaN None Unnamed:0.1 -5 YES 8 1 2 19 10.0 None 0.0 -5 None 108 NashvilleHousingMSSQL dbo tdobbins bigint
2 2.147484e+09 None ParcelID -1 YES 2147483647 1 3 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
3 2.147484e+09 None LandUse -1 YES 2147483647 1 4 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
4 2.147484e+09 None PropertyAddress -1 YES 2147483647 1 5 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
5 2.147484e+09 None Suite/Condo# -1 YES 2147483647 1 6 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
6 2.147484e+09 None PropertyCity -1 YES 2147483647 1 7 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
7 2.147484e+09 None SaleDate -1 YES 2147483647 1 8 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
8 NaN None SalePrice -5 YES 8 1 9 19 10.0 None 0.0 -5 None 108 NashvilleHousingMSSQL dbo tdobbins bigint
9 2.147484e+09 None LegalReference -1 YES 2147483647 1 10 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
10 2.147484e+09 None SoldAsVacant -1 YES 2147483647 1 11 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
11 2.147484e+09 None MultipleParcelsInvolvedinSale -1 YES 2147483647 1 12 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
12 2.147484e+09 None OwnerName -1 YES 2147483647 1 13 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
13 2.147484e+09 None Address -1 YES 2147483647 1 14 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
14 2.147484e+09 None City -1 YES 2147483647 1 15 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
15 2.147484e+09 None State -1 YES 2147483647 1 16 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
16 NaN None Acreage 6 YES 8 1 17 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
17 2.147484e+09 None TaxDistrict -1 YES 2147483647 1 18 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
18 NaN None Neighborhood 6 YES 8 1 19 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
19 2.147484e+09 None image -1 YES 2147483647 1 20 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
20 NaN None LandValue 6 YES 8 1 21 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
21 NaN None BuildingValue 6 YES 8 1 22 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
22 NaN None TotalValue 6 YES 8 1 23 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
23 NaN None FinishedArea 6 YES 8 1 24 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
24 2.147484e+09 None FoundationType -1 YES 2147483647 1 25 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
25 NaN None YearBuilt 6 YES 8 1 26 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
26 2.147484e+09 None ExteriorWall -1 YES 2147483647 1 27 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
27 2.147484e+09 None Grade -1 YES 2147483647 1 28 2147483647 NaN None NaN -1 None 39 NashvilleHousingMSSQL dbo tdobbins text
28 NaN None Bedrooms 6 YES 8 1 29 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
29 NaN None FullBath 6 YES 8 1 30 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
30 NaN None HalfBath 6 YES 8 1 31 15 10.0 None NaN 6 None 109 NashvilleHousingMSSQL dbo tdobbins float
Since we defined `columns_mss` as having an argument, `{0}`, we can pass a table name to it.

Since we defined columns_mss as having an argument, {0}, we can pass a table name to it.

To view your hooks, use `list`:

To view your hooks, use list:

In [10]:
%%sql --hook
list
Out[10]:
Alias Engine Hook
0 tables mssql+pymssql://SomeUserName:SomePassword@17... USE {0};\nSELECT * FROM sys.Tables;
1 columns_mss mssql+pymssql://SomeUserName:SomePassword@17... EXEC sp_columns {0};
2 columns_db1 mssql+pymssql://username:password@180.36.2.9... EXEC sp_columns {0};
3 views mssql+pymssql://username:password@180.36.2.9... USE {0};\nSELECT * FROM sys.Views;
4 tables_db1 mssql+pymssql://username:password@180.36.2.9... USE {0};\nSELECT * FROM sys.Tables;
5 views_l mssql+pymssql://:@localhost/DB1 USE {0};\nSELECT * FROM sys.Views;
6 databases mssql+pymssql://username:password@180.36.2.9... SELECT name FROM master.sys.databases;
And to clear your hook list, just run `--refresh` with `hooks` as the cell content, like we did for engines:

And to clear your hook list, just run --refresh with hooks as the cell content, like we did for engines:

In [2]:
%%sql --refresh
hooks
Out[2]:
'Removed all records from hooks'
## Write SQL Output to Session Variable

Write SQL Output to Session Variable

To write the output of your query to a Python variable, use the `--var` argument.

To write the output of your query to a Python variable, use the --var argument.

In [6]:
%%sql --engine=LC --var=df
select 1 as some_lc_column
Out[6]:
some_lc_column
0 1
In [7]:
df
Out[7]:
some_lc_column
0 1
## Background Queries

Background Queries

To run queries in the background, use the `--background` flag:

To run queries in the background, use the --background flag:

In [8]:
%%sql --engine=LC --var=another_df --background
select 1 as some_lc_column
Notice here that `another_df` is undefined since the query is still running:

Notice here that another_df is undefined since the query is still running:

In [9]:
another_df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-135decb41fec> in <module>
----> 1 another_df

NameError: name 'another_df' is not defined

As soon as the query is done, you can call `another_df` successfully.

As soon as the query is done, you can call another_df successfully.

In [11]:
another_df
Out[11]:
some_lc_column
0 1
I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!

I'm working on a way to notify the user when a query finishes in the background. I'm also working on adding UI buttons using ipywidgets. Enjoy!