In this article I will explain why I switched from Jupyter to Polynote, and why I think you should too. Polynote is a polyglot notebook with first-class Scala support.
This article follows a talk given at Lunatech Labs, and it is composed of two parts; we will start by explaining what a Notebook is and how we can use it in our everyday developer life. Then, we will discover Polynote, a promising Open source notebook solution with first-class Scala support. All the resources and examples are available here.
Notebook and its use cases
A notebook is composed of two parts (figure 1):
A web interface to write your code or text (in Markdown in general)
You can write in any language as long as it is supported by the kernel running on the server side. Here is a list of all the different kernels available for Jupyter notebook. I am pretty sure you can find your favorite language in this list and start using it in a notebook! Once your code is ready, you can evaluate the cell. The code will be sent to the server and, after it is processed the server will return the response to the web interface. Finally, the web interface will wrap the result in some HTML to display the result nicely. That is the general concept. Each notebook interface adds its own features to provide the best end-user experience.
My use cases
I have started to use notebooks (with Jupyter), during my university classes, for practical exercises.
It was a convenient way for our teachers to prepare the lessons.
Indeed, a teacher can put the code, the questions, and some explanations in the same file.
Then, as a student, all you have to do is to install the notebook interface and start working on the teacher’s file.
In my school, we used Jupyter.
You can install it with a single
pip command and you are ready to start.
I used the same concept to organize workshops recently as you can see with the workshop Introduction to Tensorflow in Scala.
I made a
Dockerfile with everything needed to start the workshop.
It was a great experience and I kept in mind that it can be useful during my everyday life developer. During my graduation internship, I worked on a research project with my supervisor. The difficulty was to work and synchronize together because we were not working in the same city. To solve this problem, we used notebooks extensively to interact with each other on the code. In this case, a notebook brings various advantages. For example, you can:
easily share one file which contains everything you need to run it
explain each cell using Markdown, images and Latex
run the cells and save the results. For example, with Machine Learning, a cell can take a lot of time to run, so it was an efficient way to share our findings without having to run everything over again
experiment a solution with multiple languages by simply adding kernels on the server side
Notebooks are popular for these reasons and many more. There are many ways to start using a notebook in your machine. In the next part, I will show you why I switched completely to Polynote and why you should try it!
Now let’s discuss what I think are the two biggest advantages that Polynote provides. If you can, take a look at the attached notebooks at the same time you read this section since they contain more explanations.
The main advantage of Polynote is to be polyglot. You can use different languages in the same notebook without changing the kernel. For the moment, Polynote supports 5 languages:
Picking the right language for the task at hand greatly improves your experience and performance.
As you can see in figure 3, we can easily share Scala data into Python and vice versa.
In this example, we receive a list of weather data from the OpenWeather API; we can see that the Scala variable
datas can be used transparently in Python.
To make these interactions possible, you have some restrictions (for example it is easier to use case class).
In addition, you have great interactions between Spark DataFrame and Python Pandas DataFrame.
Polynote wraps the results in HTML and adds some additional visualization features for specific types. Spark DataFrame and Pandas DataFrame have many options for that.
For example, in the output of a Spark DataFrame or Pandas Dataframe, you have a summary of your DataFrame and two buttons with additional options (the icons with a blue circle in figure 4). These buttons will open the popup that you can see in figure 5.
In my opinion, the most interesting options are View data and Plot data. In the former, you can display all your data in a spreadsheet. In the latter, you can easily plot the data by simply selecting the axes and the type of plot you want as you can see in figure 6. It will generate the corresponding block of Vega code for you.
Vega is a declarative language that allows you to create a lot of different designs as you can see in their examples. If you do not want to use Vega, you can add other plotting libraries such as Matplotlib in Python. But I advise you to try Vega and their examples because you can make powerful and fancy plots to identify edge cases in your data (figure 7). Moreover, Vega works out of the box with Polynote.
The last thing I want to talk about in this section is the WYSIWYG editor (figure 8).
It looks like a small feature but it is useful when you need to style your Markdown snippets and you do not know much about Markdown syntax.
In this menu, you can also open the Latex editor (figure 9) to write your formulas in an interactive way:
All these features allow you to make your notebooks understandable and maintainable over time.
Polyglot and its many visualization features made me prefer Polynote. If you still are not on board, it offers other important features we’ll see next.
a. Order is important
With Jupyter, all cells work with the same global state. If you work with a big notebook, you can easily mess up with the order of your logic (figure 10). If you want your Jupyter notebooks to be organized and maintainable, you have to manage everything yourself and be very rigorous if you collaborate with other people on the same notebook.
Polynote does not use a global state. Each cell has its state defined by all the cells above. As indicated in the documentation:
“This is a powerful way to enforce reproducibility in a notebook; it is far more likely that you’ll be able to re-run the notebook from top to bottom if later cells can’t affect earlier cells.”
The symbol table will summarize all variables defined in the current state. As you can see in figure 11, at the beginning of your notebook, the symbol table is either empty or contains the Spark Session if you have enabled Spark support.
If you run all the cells of this notebook, you will have the same symbol table than in the figure 12.
You can see the name of the variable and its type.
In addition, you can click on a variable to visualize your data with Polynote’s tools.
Python types are wrapped with
If we try the same experiment as we did with Jupyter (figure 10), we get not found error as expected (figure 13):
b. Highlighting running code
Polynote will highlight the current running block until it is completed (figure 14). It is a small feature but it was handy when I used Tensorflow with Polynote to quickly detect parts that were taking a long time to execute.
c. Code editing
With Jupyter, I used to have an IDE open for new libraries because you do not have code editing at all. Polynote implements code editing capabilities to facilitate your development, such as autocomplete feature (figure 15):
d. Organization of the dependencies
With Jupyter, you have to define your dependencies in a cell like you do with your code, so everything can get mixed up and become confusing if you are not rigorous enough. Also, you need to know how to use Coursier to include them in the notebook. In contrast, all dependencies in Polynote are defined at the top of the notebook in the part Configuration & dependencies, as shown in figure 16, making them simpler to manage.
Polynote brings a lot of useful features that will make your use of notebooks with Scala easy and pleasant compared to Jupyter. I really appreciate the organization of notebooks with Polynote and this is why I have switched to this notebook interface. Indeed, an organized notebook is better to collaborate with others.
Finally, this project is Open source, so feel free to contribute if you like the project!
Thanks for reading and I hope you will want to try Polynote! Feel free to contact me if you have any question about this blog post or the example notebooks.
This article was quoted in the podcast: https://scala.love/scala-valentines-5-part-2/