Connect to Dask from Outside Saturn Cloud
What if you’d like to just connect directly from your laptop to a Dask cluster, instead of using a Jupyter server at all? Saturn Cloud lets you do this too!
Create the Environment
In order to use this feature, you’ll need several Dask and Saturn Cloud related Python packages running the same versions as the Dask cluster in Saturn Cloud. To keep the versioning simple, we recommend you create a conda environment with the right specifications, as shown below. (Run this set of commands in your terminal.)
conda create -n dask-saturn dask=2.30.0 distributed=2.30.1 python=3.7
conda activate dask-saturn
pip install dask-saturn==0.3.0
Now you have the environment all set to go!
Sometimes we find that people’s local environments also have old versions of
pandas
, and this can be an issue. Check that your version of pandas is the one you want.
Creating a Saturn Cloud resource
If you don’t have a Saturn Cloud account, go to saturncloud.io and click “Start For Free” on the upper right corner. It’ll ask you to create a login. Otherwise, log into Saturn Cloud. Once you have done so, you’ll be brought to the Saturn Cloud resources page. Click “New Jupyter Server”
Given the resource a name (ex: “external-connect-demo”), but you can leave all other settings as their defaults. In the future you may want to set a specific image or instance size which you can do from the resource page. Then click “Create”
After the resource is created you’ll be brought the page for it. Next, we need to add a Dask cluster to this resource. Press the New Dask Cluster button, which will pop up a dialog for setting the Dask cluster. Choose the size each worker, the number of workers, and other options for the Dask cluster (see Create a Dask Cluster for details on those), then click Create.
Once the Dask cluster is created you’ll see it has a Connect Externally button, which provides instructions for making the external connection.
First, ensure that the client connecting to the Dask cluster has the appropriate libraries, in particular the version of dask-saturn
shown by the UI. You’ll also want to include dask
and distributed
, ideally with the same version as that in the cluster.
Next, set the SATURN_BASE_URL
and SATURN_TOKEN
environmental variables in the client machine to the values show in the dialog. Those let saturn know which particular Dask cluster to connect to.
Finally, from within the client machine you can then connect to the Dask cluster from Python:
from dask_saturn import SaturnCluster
from dask.distributed import Client
cluster = SaturnCluster()
client = Client(cluster)
client
Run the chunk, and soon you’ll see lines like this:
#> INFO:dask-saturn:Starting cluster. Status: pending
This tells you that your cluster is starting up! Eventually you’ll see something like:
#> INFO:dask-saturn:{'tcp://10.0.23.16:43141': {'status': 'OK'}}
Which is informing you that your cluster is up and ready to use. Now you can interact with it just the same way you would from a Saturn Cloud Jupyter server. If you need help with that, please check out some of our tutorials, such as Training a Model with Scikit-learn and Dask, or the dask-saturn API.
Places to connect to Saturn Cloud
Not only can you connect to Saturn Cloud from your laptop or local machine, but you can connect from other cloud-based notebooks. Check out instructions for connecting from Google Colab, SageMaker, and Azure.
Need help, or have more questions? Contact us at:
- support@saturncloud.io
- On Intercom, using the icon at the bottom right corner of the screen
We'll be happy to help you and answer your questions!