Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 50

3.5 Java

Оглавление

Java is one of the most popular programming languages (according to the TIOBE index, www.tiobe.com/tiobe‐index/), partially due to its extensive library ecosystem. Java's design seduces programmers – it is simple, object oriented, and portable. Java applications run on any machine, from personal laptops to high‐performance supercomputers, even game consoles and internet of things (IoT) devices. Notably, Android (based on Java) development has driven recent Java innovations. Java's “write once, run anywhere” adage provides versatility, triggering interest even at the research level.

Developers may prefer Java for intensive calculations performing slowly within scripted languages (e.g., R). For speed‐up purposes, Java's cross‐platform design could even be preferred to C/C in certain cases. Alternatively, Java code can be wrapped nicely in an R package for faster processing. For example, the rJava package allows one to call java code in an R script and also reversely (calling R functions in Java). On the other hand, Java can be used independently for statistical analysis, thanks to a nice set of statistical libraries.

Popular sources of native Java statistical and mathematical functionalities are JSC (Java Statistical Classes) and Apache Commons Math application programming interfaces (APIs) (http://commons.apache.org/proper/commons‐math/). JSC and Apache Commons Math libraries perform many methods including univariate statistics, parametric and nonparametric tests (‐test, chi‐square test, and Wilcoxon test), random number generation, random sampling/resampling, regression, correlation, linear or stochastic optimization, and clustering.

Additionally, Java boasts an extensive number of machine‐learning packages and big data capabilities. For example, Java enables the WEKA [21] tool, the JSAT library [22], and the TensorFlow framework [23]. Moreover, Java provides one of the most desired and useful big data analysis tools – Apache Spark [24]. Spark provides ML support through modules in the Spark MLlib library [25].

As with other discussed software, Java APIs often require importing other packages/libraries. For example, developers commonly use external matrix‐operation libraries, such as JAMA (Java matrix package, https://math.nist.gov/javanumerics/jama/) or EJML (efficient Java matrix library, http://ejml.org/wiki/). Such packages allow for routine computation – for example, matrix decomposition and dense/sparse matrix calculation. JFreeCHart enables data visualization by generating scatter plots, histograms, barplots, and so on. Recently, these Java libraries are being replaced by more popular JavaScript libraries such as Plot.ly (https://plot.ly/), Bokeh (bokeh.pydata.org), D3 [26], or Highcharts (www.highcharts.com).

As outlined above, Java could serve as a useful statistical software solution, especially for developers familiar with it or who have interest in cross‐platform development. We would then recommend its use for seasoned programmers looking to add some statistical punch to their desktop, web, and mobile apps. For the analysis of big data, Java offers some of the best ML tools available.

Computational Statistics in Data Science

Подняться наверх