KNIME vs. RapidMiner

10 April 2017

There is now such a wide variety of software, some open source, for analysing data. Both KNIME and RapidMiner are both hailed by Gartner as 2017’s market leaders in data science platforms so it is high time to compare these two programs. If you are interested in seeing a comparison between SAS and IBM SPSS (the two other market leaders in Gartner’s 2017 Magic Quadrant) with R and Python, then be sure to read this article.

 

KNIME (Konstanz Information Miner; Zurich) is an open source analysis platform that came into being back in 2004 in response to the need for specific analysis tools for the pharmaceutical industry. This can also be seen in the specific chemical analysis options offered by KNIME. KNIME is now also used in many other fields such as marketing and customer intelligence and can be used by both new analysts and more technical data scientists alike.

 

RapidMiner (Boston) is an analytics program that was originally created at the Technical University of Dortmund in 2001. RapidMiner went on to become a fast-growing software company with more than 300% growth between 2007 and 2012, which now has established roots in Boston. RapidMiner can be used both by experienced scientists and new analysts alike.

 

In this blog we are going to compare the free versions of KNIME and RapidMiner. For KNIME that is the KNIME Analytics Platform and for RapidMiner it is the RapidMiner Community Edition. Of course, both products also have various types of possible upgrades so you can scan in more data or different data, run more analyses or different analyses or receive more support. We’ll come back to that later in the article.

 

User interface (look and feel)

Let’s start the comparison with the look and feel of the programs. Both use a graphical user interface; you can drag and drop all kinds of operations to add them to your analysis flow (KNIME) or process (RapidMiner). You can then further specify these operations using a menu; e.g. what sort of decision tree you want to run, how you want to split your testing and training set, or how you want to impute your missings.

Figure 1: Flow in KNIME
Below you can see the node repository where you can choose from all kinds of operations. You can then drag and drop these to add them to your flow on the right (Knime.org).

Figure 2: Process in RapidMiner
Below you can see operators where you can choose from all kinds of operations. You can drag and drop these to add them to your process on the right (docs.rapidminer.com).

The graphical user interface means projects look uncluttered, allowing you to make links between projects and making it easier to cooperate on one project. In this respect both KNIME and RapidMiner look like SPSS Modeler and SAS Enterprise Miner (and Guide). Major alternatives for analysis in the open source domain – such as R and Python – do not (yet) offer the user a generic graphical user interface and therefore they are less accessible than KNIME and RapidMiner.

I reckon KNIME wins this round. Not just because of the clear naming system of the operations, the good use of colours, but I am also a big fan of the traffic lights that KNIME uses in its interface, which indicate what stage of processing each operation is at. If there is a red light, the operation still needs to be configured. If there is an amber light, it is ready to run. And if there is a green light, the operation is complete and you can view the output. So you can see how far you have got at a single glance.

 

Methods and Techniques

KNIME and RapidMiner are fairly closely matched when it comes to methods and techniques. They are both good at all kinds of data analysis techniques such as decision trees, segmentation analyses, feature selection, neural networks and many more. They also both have very diverse data preparation techniques in their standard package. KNIME has a choice of more than 1000 standard operations and RapidMiner even has more than 1500. Its integration with Weka etc. gives KNIME a slight advantage because it allows you to be more flexible and to apply more advanced methods and techniques.

All the possible operations are clearly structured in both programs and you can search in all the possible operations to find what you need. It is a drawback that the operations are always called something slightly different to what you expect. For example, in KNIME a zero variance filter is called a low variance filter so you’ll be looking for a while if you search for “zero”.

Furthermore, in both KNIME and RapidMiner you can create loops to make operations run on a large number of variables. I have not encountered a single operation in either program that I wanted to run but couldn’t find. Therefore this round is a draw.

 

Cooperation with Other Packages

RapidMiner and KNIME are both able to cooperate with R and Python. Both programs require you to install an extra plug-in (free) allowing you to invoke R and Python scripts from your flow or process. With KNIME you have the option to cooperate with Weka, SQL and Java etc. On RapidMiner you can scan in all JDBC connections and you can have a direct link with Twitter and Salesforce. In both programs you can scan in files such as Excel, CSV, SAS, STATA, SPSS, XML and many more. Because there are no major differences between the two in terms of cooperation, this round is another draw.

 

User-Friendliness

As mentioned above, both programs have a graphical interface which means that even if you are a beginner you can get straight to work on your analyses. Furthermore, both programs allow you to insert “code snippets” (e.g. in Java) for more complex operations that are not included in the standard package. They also offer an extensive description of each operation (node) with an explanation and an example of how you can best carry out the operation.

Both programs also have the option to add “post-its” to your flow so you can add information to your analyses, e.g. in order to show other people what you have done and why. Both also allow you to cluster various operations into a folder (or rather a “metanode” in KNIME and a “subprocess” in RapidMiner) to keep your process uncluttered.

I think KNIME wins in this category by a long way, for three reasons. Firstly, in KNIME you can link an operation to multiple other operations. This basically means it is easy to draw multiple lines (see above in the KNIME flow for how links are sometimes split). With RapidMiner, you always need a multiplier operation first before you can add multiple operations.

In KNIME you can also run operations individually, which is a nice afterthought when you want to test a small part of your flow, when you only want to re-run the last part of your flow, or when you only want to run a big operation 1 time rather than every time. In RapidMiner you can only run the entire flow. Although you can manually deactivate individual operations, that often also deletes the connections between the operations, which creates extra work when you want to add the operation back into your process.

Finally, in RapidMiner you always have to link your process to the out-port to get output (in the RapidMiner process about you can see a circle on the right: this is the out-port); especially if you want to run a part of your process it is really fiddly to link the last operation to the out-port.

 

Support

RapidMiner and KNIME lack any technical support in the free version. You can of course buy it by upgrading your package. However, both programs do have a large community where you can search for all kinds of questions and answers. For RapidMiner there are many different forums to complement the official documentation; for KNIME you mostly get referred on to KNIME’s official channels, including KNIMETV or the KNIME Forum.

RapidMiner has a function called “Wisdom of the Crowds” where you can get recommendations on what other users did next based on the operations you currently have in your process (see the “recommended operators” in the RapidMiner process screenshot at the bottom).

KNIME has a similar function called the “Workflow Coach”. This coach shows what step to take next based on data from all users. But you can also get it to only analyse your own flows and therefore get it to predict a helpful way to proceed next based on your own data.

Since there is more documentation for RapidMiner and it is a bit easier to find out what capabilities the program offers, RapidMiner wins this round.

 

Costs

Both programs have a free version which is what we compared here. But if you want to be able to do more than these versions, you can of course also buy extra options. The Community Edition of RapidMiner can only scan in a maximum of 10,000 rows of data. It doesn’t take long to reach that amount, and so then you will then need to upgrade to the Small Edition (100,000 rows, €2,500 per year per user), to the Medium Edition (1,000,000 rows, €5,000 per year per user) or to the Large Edition (unlimited rows, €10,000 per year per user). Of course, all these packages also give you customer support and each package enables you to access more processors to carry out your analyses more quickly.

With KNIME the amount of data and processors you use is not limited. A useful upgrade, which can be especially important for cooperation within a team, is KNIME Teamspace which allows you to work with multiple people at the same time in the same Knime workspace (€2,000 per user).
With RapidMiner you will start paying from the moment you start working with serious quantities of data. With KNIME you will start paying from the moment you want to cooperate on projects and KNIME is part of the operating processes.

 

The Choice

After comparing them, my choice is: KNIME.

 

As you can see above and as explained in each section, I prefer KNIME. This program is more user-friendly, has a better graphical interface and is cheaper than RapidMiner (assuming that in our field of work you will need more than 10,000 rows). The fact that support is slightly better for RapidMiner is just something we have to take on the chin.

Latest news

How do I visualise neighborhood data on a map in R? Polygon plotting explained

7 June 2019

As a data analyst you want to provide clear cut insights for your end users, enabling... read more

Python & R vs. SPSS & SAS – comparison renewed after two years

18 February 2019

Two years ago we published an article in which we compared the four most used programmes... read more

Modelplot[R/py]: graphs to expose the business value of predictive models

8 October 2018

It will be a familiar challenge to many data scientists and data analysts: you have built... read more

Subscribe to our newsletter

Never miss anything in the field of advanced analytics, data science and its application within organizations!