Vincent Brulé

In this article I will explain why I switched from Jupyter to Polynote, and why I think you should too. Polynote is a polyglot notebook with first-class Scala support.

This article follows a talk given at Lunatech Labs, and it is composed of two parts; we will start by explaining what a Notebook is and how we can use it in our everyday developer life. Then, we will discover Polynote, a promising Open source notebook solution with first-class Scala support. All the resources and examples are available here.

Notebook and its use cases

A notebook is composed of two parts (figure 1):

A server with a kernel responsible for evaluating your code. For example, Almond is a Scala kernel for Jupyter
A web interface to write your code or text (in Markdown in general)

Schema notebook — Figure 1: Summary of how a notebook works (Jupyter + Almond)

You can write in any language as long as it is supported by the kernel running on the server side. Here is a list of all the different kernels available for Jupyter notebook. I am pretty sure you can find your favorite language in this list and start using it in a notebook! Once your code is ready, you can evaluate the cell. The code will be sent to the server and, after it is processed the server will return the response to the web interface. Finally, the web interface will wrap the result in some HTML to display the result nicely. That is the general concept. Each notebook interface adds its own features to provide the best end-user experience.

My use cases

I have started to use notebooks (with Jupyter), during my university classes, for practical exercises. It was a convenient way for our teachers to prepare the lessons. Indeed, a teacher can put the code, the questions, and some explanations in the same file. Then, as a student, all you have to do is to install the notebook interface and start working on the teacher’s file. In my school, we used Jupyter. You can install it with a single pip command and you are ready to start. I used the same concept to organize workshops recently as you can see with the workshop Introduction to Tensorflow in Scala. I made a Dockerfile with everything needed to start the workshop.

It was a great experience and I kept in mind that it can be useful during my everyday life developer. During my graduation internship, I worked on a research project with my supervisor. The difficulty was to work and synchronize together because we were not working in the same city. To solve this problem, we used notebooks extensively to interact with each other on the code. In this case, a notebook brings various advantages. For example, you can:

easily share one file which contains everything you need to run it
explain each cell using Markdown, images and Latex
run the cells and save the results. For example, with Machine Learning, a cell can take a lot of time to run, so it was an efficient way to share our findings without having to run everything over again
experiment a solution with multiple languages by simply adding kernels on the server side

Notebooks are popular for these reasons and many more. There are many ways to start using a notebook in your machine. In the next part, I will show you why I switched completely to Polynote and why you should try it!

Polynote

Figure 2: Polynote's logo

Now let’s discuss what I think are the two biggest advantages that Polynote provides. If you can, take a look at the attached notebooks at the same time you read this section since they contain more explanations.

Polyglot

The main advantage of Polynote is to be polyglot. You can use different languages in the same notebook without changing the kernel. For the moment, Polynote supports 5 languages:

Picking the right language for the task at hand greatly improves your experience and performance. As you can see in figure 3, we can easily share Scala data into Python and vice versa. In this example, we receive a list of weather data from the OpenWeather API; we can see that the Scala variable datas can be used transparently in Python. To make these interactions possible, you have some restrictions (for example it is easier to use case class). In addition, you have great interactions between Spark DataFrame and Python Pandas DataFrame.

Figure 3: Interaction between Python and Scala inside a Polynote notebook

Visualization

Polynote wraps the results in HTML and adds some additional visualization features for specific types. Spark DataFrame and Pandas DataFrame have many options for that.

Figure 4: HMTL Output of a Spark DataFrame

For example, in the output of a Spark DataFrame or Pandas Dataframe, you have a summary of your DataFrame and two buttons with additional options (the icons with a blue circle in figure 4). These buttons will open the popup that you can see in figure 5.

Extra tools for visualization inside Polynote — Figure 5: Tools to investigate on your DataFrames

In my opinion, the most interesting options are View data and Plot data. In the former, you can display all your data in a spreadsheet. In the latter, you can easily plot the data by simply selecting the axes and the type of plot you want as you can see in figure 6. It will generate the corresponding block of Vega code for you.

Extra tools for visualization inside Polynote (Plot section) — Figure 6: **Plot data** option

Vega is a declarative language that allows you to create a lot of different designs as you can see in their examples. If you do not want to use Vega, you can add other plotting libraries such as Matplotlib in Python. But I advise you to try Vega and their examples because you can make powerful and fancy plots to identify edge cases in your data (figure 7). Moreover, Vega works out of the box with Polynote.

The last thing I want to talk about in this section is the WYSIWYG editor (figure 8).

Polynote's wysiwyg editor — Figure 8: Polynote's WYSIWYG editor

It looks like a small feature but it is useful when you need to style your Markdown snippets and you do not know much about Markdown syntax.

In this menu, you can also open the Latex editor (figure 9) to write your formulas in an interactive way:

All these features allow you to make your notebooks understandable and maintainable over time.

Polyglot and its many visualization features made me prefer Polynote. If you still are not on board, it offers other important features we’ll see next.

Extra features

a. Order is important

With Jupyter, all cells work with the same global state. If you work with a big notebook, you can easily mess up with the order of your logic (figure 10). If you want your Jupyter notebooks to be organized and maintainable, you have to manage everything yourself and be very rigorous if you collaborate with other people on the same notebook.

Figure 10: Order is not important with Jupyter

Polynote does not use a global state. Each cell has its state defined by all the cells above. As indicated in the documentation:

“This is a powerful way to enforce reproducibility in a notebook; it is far more likely that you’ll be able to re-run the notebook from top to bottom if later cells can’t affect earlier cells.”

The symbol table will summarize all variables defined in the current state. As you can see in figure 11, at the beginning of your notebook, the symbol table is either empty or contains the Spark Session if you have enabled Spark support.

The symbol table at the beginning of the notebook — Figure 11: The symbol table at the beginning of this notebook

If you run all the cells of this notebook, you will have the same symbol table than in the figure 12.

The symbol table at the end of the notebook — Figure 12: The symbol table at the end of this notebook

You can see the name of the variable and its type. In addition, you can click on a variable to visualize your data with Polynote’s tools. Python types are wrapped with TypedPythonObject.

If we try the same experiment as we did with Jupyter (figure 10), we get not found error as expected (figure 13):

Order is not important with Polynote — Figure 13: Order is important with Polynote

b. Highlighting running code

Polynote will highlight the current running block until it is completed (figure 14). It is a small feature but it was handy when I used Tensorflow with Polynote to quickly detect parts that were taking a long time to execute.

c. Code editing

With Jupyter, I used to have an IDE open for new libraries because you do not have code editing at all. Polynote implements code editing capabilities to facilitate your development, such as autocomplete feature (figure 15):

Figure 15: Autocomplete of Scala and Python code

d. Organization of the dependencies

With Jupyter, you have to define your dependencies in a cell like you do with your code, so everything can get mixed up and become confusing if you are not rigorous enough. Also, you need to know how to use Coursier to include them in the notebook. In contrast, all dependencies in Polynote are defined at the top of the notebook in the part Configuration & dependencies, as shown in figure 16, making them simpler to manage.

Dependencies management — Figure 16: Configuration & dependencies section

Conclusion

Polynote brings a lot of useful features that will make your use of notebooks with Scala easy and pleasant compared to Jupyter. I really appreciate the organization of notebooks with Polynote and this is why I have switched to this notebook interface. Indeed, an organized notebook is better to collaborate with others.

Finally, this project is Open source, so feel free to contribute if you like the project!

Thanks for reading and I hope you will want to try Polynote! Feel free to contact me if you have any question about this blog post or the example notebooks.

Resources

This article was quoted in the podcast: https://scala.love/scala-valentines-5-part-2/