Software

Here you will find software created by the AG Datenbionik for scientific purposes, which we make publicly available. Please cite the corresponding publications when using them.

NameCodeLicense LinkAuthors
DataIORGPLGithubAlfred Ultsch, Florian Lerch, Michael Thrun, Catharina Lippman, Felix Pape, Onno Hansen-Goos, Sabine Herda
DataVisualizationsRGPL CRANMichael Thrun, Felix Pape, Onno Hansen-Goos, Fredericke Matz, Alfred Ultsch
DatabionicSwarmRGPL CRANMichael Thrun
ProjectionBasedClusteringRGPL CRANMichael Thrun, Florian Lerch, Felix Pape, Kristian Nybo, Jarkko Venna
GeneralizedUmatrixRGPL CRANMichael Thrun, Alfred Ultsch
Umatrix
RGPLDownload, Manual,
First Steps
Florian Lerch, Michael Thrun, Alfred Ultsch
AdaptGauss: Gaussian Mixture Models (GMM)
R
GPL
CRAN
Michael Thrun, Onno Hansen-Goos, Rabea Griese, Catharina Lippmann, Florian Lerch, Jörn Lötsch, Alfred Ultsch
ABCanalysis RGPL CRAN, OnlineMichael Thrun, Florian Lerch, Jörn Lötsch, Alfred Ultsch
Vademecum Java GPL Sourceforge Project Torben Rühl, Steffen Springer, Burcu Dalmis, Jan Kohlhof, Dirk Schäfer 
Databionic ESOM ToolsJavaGPLSourceForge ProjectChristan Stamm, Mario Nöcker, Fabian Mörchen, u.v.a.
Databionic MusicMinerJavaGPLSourceForge ProjectMario Nöcker, Christan Stamm, Fabian Mörchen, Niko Efthymiou, Michael Thies, Ingo Löhken, u.v.a.
Time Series Knowledge MiningMatlabGPL DownloadFabian Mörchen
Pareto Density EstimationRGPLCRANMichael Thrun, Onno Hansen-Goos, Rabea Griese, Catharina Lippmann, Jörn Lötsch, Alfred Ultsch
Persist Time Series DiscretizationMatlabGPL DownloadFabian Mörchen
Audio Feature ExtractionMatlabGPL Ingo Löhken, Michael Thies, Fabian Mörchen
DWT/DFT time series feature extractionMatlabGPL DownloadFabian Mörchen
LaTeX/PDF ReportsMatlabGPL DownloadFabian Mörchen
Spin3D
JavaGPLSourceforge Project
Pascal Lehwark

Generalized Umatrix

Projections from a high-dimensional data space onto a two-dimensional plane are used to detect structures, such as clusters, in multivariate data. The generalized Umatrix is able to visualize errors of these two-dimensional scatter plots by using a 3D topographic map.

Ultsch, A., & Thrun, M. C.: Credible Visualizations for Planar Projections, in Cottrell, M. (Ed.), 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), IEEE Xplore, France, 2017.

Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018. 

Databionic Swarm

Here a swarm system, called databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data such as natural clusters characterized by distance and/or density based structures in the data space. The first module is the parameter-free projection method Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors based on the generalized U-matrix. The third module is the clustering method itself with non-critical parameters. The clustering can be verified by the visualization and vice versa. Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, DOI:10.1007/978-3-658-20540-9, 2018.     

Projection Based Clustering 

Various visualizations of high-dimensional data such as heat map and silhouette plot for grouped data, visualizations of the distribution of distances, the scatter-density plot for two variables, the Shepard density plot and many more are presented here. Additionally, ‘DataVisualizations’ makes it possible to inspect the distribution of each feature of a dataset visually through the combination of four methods.Thrun, M.C., Ultsch, A.: Projection based Clustering, Conf. Int. Federation of Classification Societies (IFCS), DOI:10.13140/RG.2.2.13124.53124, Tokyo, 2017.    

DataVisualizations

Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.

Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, Vol. accepted, Foundation of the Cracow University of Economics, Zakopane, Poland, 2018.

Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.

Umatrix

Interactives R Tool für ESOM Berechnung, U und Pmatrix Generierung, sowie U*matrix generierung und automatischer Inselausschneidung mit interactiver Clusterung. Demnächst auf CRAN, momentan schon vorab in der betha-Version auf dieser Webseite. The following packages have to be installed/Imports: Rcpp, ggplot2, shiny, ABCanalysis, shinyjs, reshape2, fields, plyr, abind, tcltk, png, tools, grid, rgl

Thrun, M. C., Lerch, F., Lötsch, J., & Ultsch, A.Visualization and 3D Printing of Multivariate Data of Biomarkers, Proc. of International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, 2016.

AdaptGauss

For a given data vector, the package provides a density estimate according to PDE [Ultsch 2005]. In an interactive tool, a Gaussian mixture model (GMM) can be generated manually or automatically (expectation-maximization algorithm) via the visualization of this density estimate. The GMM can be verified via a QQplot or a chi-square distribution test. Boundaries between the components of the GMM are calculated using Bayes’ theorem.

Ultsch, A., Thrun, M.C., Hansen-Goos, O., Lötsch, J.: Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox(AdaptGauss), International Journal of Molecular Sciences, doi:10.3390/ijms161025897, 2015.

Thrun M.C.,Ultsch, A., Models of Income Distributions for Knowledge Discovery, European Conference on Data Analysis, DOI 10.13140/RG.2.1.4463.0244, Colchester 2015.

ABC Analyse

For a given data set, the package provides a new method in the R programming language for calculating precise boundaries between subgroups that can be easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphing the cumulative distribution function. Based on an ABC analysis, the algorithm uses the ABC curve to calculate the optimal limits by exploiting the mathematical properties of the distribution of the analyzed elements. The data consist of positive values and are divided into three disjoint subsets A, B and C, where subset A, contains the very profitable values, i.e., largest data values (“the most important”) subset B, the values at which the profit equals the effort to obtain, and subset C, which contains of non-profitable values, i.e., the smallest data sets (“the trivial”).

Ultsch, A., Lötsch, J.:Computed ABC analysis for rational selection of most informative variables in multivariate data, PLoS One, 2015.

Vademecum

As part of a student project work, the DataMining suite “Vademecum” was developed. It is a software that supports, guides and prevents the user from making mistakes during the knowledge discovery process. For all further information please visit the SourceForge Project.

Databionic ESOM Tools

As part of a project group, we developed the Databionics ESOM Tools, a software package for training, visualization and interactive analysis of emergent self-organizing feature maps. The software is available under the GPL. For all further information please visit the SourceForge Project.
Ultsch, A., Mörchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM, Technical Report No. 46, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005)

Databionic MusicMiner

In the context of a project group we developed the Databionic MusicMiner. It is a program that calculates the similarity of music pieces from the sound and displays a music collection as a map based on this. The software is available under the GPL. For all further information please visit the SourceForge Project.
Mörchen, F., Ultsch, A., Thies, M., Löhken, I., Nöcker, M., Stamm, C., Efthymiou, N., Kümmerer, M.: MusicMiner: Visualizing timbre distances of music as topographical maps, Technical Report No. 47, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005)

Time Series Knowledge Mining

Time Series Knowledge Mining (TSKM) is a methodology for finding understandable patterns in multivariate time series.
Download
Mörchen, F.: Time Series Knowledge Mining, Phd thesis, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2006)

Pareto Density Estimation

Die Pareto Density Estimation is eine Informations-optimale Schätzung der empirischen WahrThe Pareto Density Estimation is an information-optimal estimation of the empirical probability density. We provide an implementation for R in the AdaptGauss package.
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discover, in  Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.

Persist Time Series Discretization

The Persist algorithm allows a discretization of time series into states of optimal duration. In contrast to conventional static histogram methods, the temporal sequence of values is used to optimize the bins. We provide an implementation for Matlab under the GPL. Download.
Mörchen, F., Ultsch, A.: Optimizing Time Series Discretization for Knowledge Discovery, Grossman, R.L., Bayardo, R., Bennet, K., Vaidya, J. (Eds), In Proceedings The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, (2005), pp. 660-665

Audio Feature Extraction

The analysis of music data is often done on sound features calculated on short time windows. A well-known example are the Mel Frequency Cepstral Coefficients (MFCC). Within the framework of a project group, we have developed flexible software for the computation of a large number of such sound features. We provide an implementation for Matlab under the GPL on request.
Mörchen, F., Ultsch, A., Thies, M., Löhken, I.: Modelling timbre distance with temporal statistics from polyphonic music, IEEE Transactions on Speech and Audio Processing 14(1)IEEE, pp, 81-90, 2006.

DWT/DFT time series feature extraction

The best selection of coefficients from the Discrete Wavelet Transform (DWT) or the Discrete Fourier Transform (DFT) of time series in terms of energy conservation is in descending order of magnitude. For a set of time series such as those available for clustering or classification, this leads to poorly comparable representations, since different coefficients can be selected per time series. We have therefore proposed a global selection strategy that combines a comparable representation with good energy conservation. We provide an implementation for Matlab under the GPL: Download.
Mörchen, F.: Time series feature extraction for data mining using DWT and DFT, Technical Report No. 33, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2003)

LaTeX/PDF Reports

This small toolbox allows the creation of PDF reports with Matlab functions. By attaching results in the form of tables and images, a documentation is automatically created that can be conveniently analyzed later. LaTeX and Ghostscript are required as additional software:

Spind3D

Spin3D – OpenGL Visualization Tool for high dimensional data.