If you are more a Data Scientist than a Data Engineer, you’ve just started working in Azure Synapse Analytics Studio and you feel lost and frustrated every now and again, I feel you and I’m here for you. I can only hope you are as lucky as I am, having some very skilled Data Engineering colleagues (my heroes: Henrik Reich, Farzad Bonabi & Ernst Bolle). Without them I would feel completely lost.
Even though there is a whole lot to love when it comes to Azure Synapse Analytics Studio, some should-have-been-easy tasks can cause a lot of question marks and frustration. One of those tasks, for me, was how to save a graph created in a notebook as a png file on the Azure Data Lake Generation 2 (abfss location). Which, in our Azure Synapse, is located here:
I’ve done a fair share of googling, all very unnecessary as I should have just run to my Data Engineering colleagues, who were there to save the/my day again. Of course, as expected, this task was very easy after all. It’s always easy when you know how to do it, isn’t it?
For my fellow Data Scientists, who are either a bit too independent like me, or simply don’t have any Data Engineering colleagues at hand, this short article shows you the very few lines of code needed to use plt.savefig (or any method like this one) to save your graph as a png file on an Azure Data Lake Gen 2 from our Azure Synapse Analytics Studio Notebook.
Or, to make it a bit more general, how to save a file from a Synapse Notebook on a Data Lake Gen 2, using a non-spark kind of Python package. But for this example, I’ll show it using matplotlibs savefig function.
The Python code
import matplotlib.pyplot as plt
# before we can save, for instance, figures in our workspace (or other location) on the Data Lake Gen 2 we need to mount this location in our notebook
# there is no harm in running this cell multiple times
# we need to define the linked service, which for me is Workspaces
# and the name of the workspace, which for me is my own name
mssparkutils.fs.unmount('/jeanineschoonemann)
mssparkutils.fs.mount("abfss://jeanineschoonemann@cldidevdpdsws.dfs.core.windows.net", "/jeanineschoonemann ", {"linkedService":"Workspaces"})
# we’re good to go now already! I told you it was simple :)
# create a sample plot, using some sample data
plt.plot([0, 1, 2, 3, 4], [0, 3, 5, 9, 11])
plt.xlabel('Months')
plt.ylabel('Books Read')
plt.show()
# retrieve the job-id from the Spark environment
jobId = mssparkutils.env.getJobId()
# now save the image as a png file to the location we've mounted before
plt.savefig(f"/synfs/{jobId}/jeanineschoonemann/mybookfolder/books.png")