An article about three functionalities a newbie like me likes about SAS Enterprise Guide
Although I have found my way in the world of different scripting/ coding languages, like multiple SQL dialects, DAX and a bit of Python, I had never opened SAS Enterprise Guide (SAS EG from now on), until a few months ago. I had always been told that SAS was a dark world, a world with beasts that could only be tamed by true SAS experts. Those who had been discovering this world for several years (or those who had followed at least one training course in it). You can imagine my utmost delight when I learned that there was something like PROC SQL.
Now, six months after opening SAS EG for the first time, I am still very glad that PROC SQL exists, but I am actually impressed with SAS EG and glad that I learned to work with it.
So, open SAS EG, PROC SQL, and then what?
In this blog, I highlight three functionalities that I am quite excited about: Creating macros, making (cool) visualizations and performing simple models. I will briefly describe these functionalities and explain how I applied them. Disclaimer: No, I am not a SAS (EG) expert and you will not find a detailed elaboration on how the different functionalities work here. This, the SAS documentation can do a lot better. Use this article for inspiration. And yes, I will add the links to the SAS documentation.
Do you remember the good old macros within Excel? Well, SAS EG has something similar. In essence, a macro is a recorded number of steps you want to execute. Between %MACRO (start of macro) and %MEND (end of macro), you tell very precisely what the function should do when you will use it later. When you are building the macro you can include certain inputs as conditions, for example one or more variables, timestamps, numbers or text. Later, when you want to use the macro, you trigger it by writing %MACRO_NAME(), with the corresponding input between brackets.
You can now run the macro as many times as you want, with inputs as different as you want, without having to copy and adjust the entire script. Can you imagine…
- How much time you save if you want to change something in the script later?
- How easy it is to split up scripts? For example, to save on run time or to make the run more manageable?
- How easily and quickly you can run different models with different inputs?
- How much lines of code this saves? Which benefits readability of your scripts, reduced complexity and handover-time.
In my current assignment in the finance sector, I work a lot with transaction data. Such rapidly increasing tables easily consist of tens of millions of rows. You can probably imagine that this can be a challenge in terms of run-time, when you want to use this data source or when you join it on a product or customer table. Especially if data management has set a maximum on runtime.
To avoid such issues at runtime, the script is rewritten into a macro, dividing the script into smaller time periods. Let’s call the macro %transaction_script(). In doing so, I do not specify a specific start and end date in the script, but instead refer to the conditions I pass along in the macro, e.g. begindate and enddate. Later, when I use the macro, I only need to specify the name of the macro, with associated conditions.
%MACRO transactions_script(timestampID,startdate,enddate); PROC SQL; CONNECT TO database_x(); EXECUTE BY database_x ( CREATE TABLE database.schema.name_×tampID. AS SELECT … FROM … … DISTRIBUTE ON … ) %MEND; %MACRO transactions_script (TS1,'2019-01-01','2019-12-31'); %MACRO transactions_script (TS2,'2020-01-01','2020-12-31'); %MACRO transactions_script (TS3,'2021-01-01','2021-12-31');
Click here for more details on macros in the SAS documentation.
No, SAS EG is not my favorite visualization platform. I wouldn’t recommend it as a dashboard tool or as a feed for important presentations. Still, it can create rather nice visualizations, which can often come in handy. Think of quick explorations of your data or visualizing the output from your clustering model (I’ll come back to this later). It can be very helpful in understanding your data, detecting outliers and mapping your results. And the best of all is that anyone can visualize in SAS EG. Using simple statements, you create useful visuals that you can even customize to some degree.
An example of when visualizing comes in very handy, is when interpreting a correlation matrix (PROC CORR). A correlation matrix without color quickly makes you dizzy, but when the values closer to -1 and +1 stand out, it suddenly becomes a very useful overview. Instead of exporting the results to a visualization tool or to Excel, you can also create a heat map in SAS EG. It may not work as easily, but once you have it set up properly, you can easily reuse it as often as you want.
Click here for the SAS documentation on visualizations.
Finally, modelling in SAS EG. Even for creating relatively simple models, you don’t need to export your data from SAS EG to for example SAS Miner, or to another non-SAS tool. Analyses such as regression tests and cluster methods, which do not require turning too many specific buttons, are quickly created with simple statements.
After the previous paragraphs, it is no surprise that I used the clustering functionality in SAS EG. To be more precise, the K-Means Clustering method (FASTCLUS). I used this analysis to group the target population based on behavioral variables, after which I tried to give them meaning by linking the created clusters to more descriptive variables.
The clustering case is a nice example, as it makes use of the two previously described functionalities as well. From the cluster script I made a macro, with the number of K-Means clusters as a condition, e.g. %cluster_script(K). This allowed me to quickly generate, without taking up too much space in the script, the cluster outcomes of 2 to 10 cluster solutions.
%MACRO clustering_script(K); PROC FASTCLUS DATA= work.data_set OUT= work.out&K. OUTSTAT= work.outstat&K. MAXCLUSTERS= &K. MAXITER= 1000; VAR= variable1 variable2 variable3; RUN; PROC SQL; CREATE TABLE work.clus&K. as ( SELECT &K. as n_clust ,max(case when _type_ = 'RSQ’ then over_all end) as RSQ ,max(case when _type_ = 'CCC’ then over_all end) as CCC ,max(case when _type_ = 'PSEUDO_F’ then over_all end) as PSEUDO_F FROM work.outstat&K. GROUP BY n_clust; RUN; %MEND;
Then, these outcomes could be compared, to determine the best number for K. In making this decision, we were looking for the right balance of explaining additional variation and keeping the results implementable. This can be done using the RSQ, CCC and Pseudo-F outcomes. Which of course you can visualize in a nice line graph, with the explained variation on the Y-axis and the number of clusters on the X-clusters.
Click here for the SAS documentation on different models.
Not too long ago, I knew nothing about SAS EG and was a bit reluctant to work with it myself. However, SAS EG has surprised me in a positive way and has shown me some valuable features. If you’ve read on until here, I hope you’ve found the practical examples useful (or at least enjoyed reading the blog).