Here you will find software created by the AG Datenbionik for scientific purposes, which we make publicly available. Please cite the corresponding publications when using them.
Name  Code  License  Link  Authors 
DataIO  R  GPL  Github  Alfred Ultsch, Florian Lerch, Michael Thrun, Catharina Lippman, Felix Pape, Onno HansenGoos, Sabine Herda 
DataVisualizations  R  GPL  CRAN  Michael Thrun, Felix Pape, Onno HansenGoos, Fredericke Matz, Alfred Ultsch 
DatabionicSwarm  R  GPL  CRAN  Michael Thrun 
ProjectionBasedClustering  R  GPL  CRAN  Michael Thrun, Florian Lerch, Felix Pape, Kristian Nybo, Jarkko Venna 
GeneralizedUmatrix  R  GPL  CRAN  Michael Thrun, Alfred Ultsch 
Umatrix  R  GPL  Download, Manual, First Steps  Florian Lerch, Michael Thrun, Alfred Ultsch 
AdaptGauss: Gaussian Mixture Models (GMM)  R  GPL  CRAN  Michael Thrun, Onno HansenGoos, Rabea Griese, Catharina Lippmann, Florian Lerch, Jörn Lötsch, Alfred Ultsch 
ABCanalysis  R  GPL  CRAN, Online  Michael Thrun, Florian Lerch, Jörn Lötsch, Alfred Ultsch 
Vademecum  Java  GPL  Sourceforge Project  Torben Rühl, Steffen Springer, Burcu Dalmis, Jan Kohlhof, Dirk Schäfer 
Databionic ESOM Tools  Java  GPL  SourceForge Project  Christan Stamm, Mario Nöcker, Fabian Mörchen, u.v.a. 
Databionic MusicMiner  Java  GPL  SourceForge Project  Mario Nöcker, Christan Stamm, Fabian Mörchen, Niko Efthymiou, Michael Thies, Ingo Löhken, u.v.a. 
Time Series Knowledge Mining  Matlab  GPL  Download  Fabian Mörchen 
Pareto Density Estimation  R  GPL  CRAN  Michael Thrun, Onno HansenGoos, Rabea Griese, Catharina Lippmann, Jörn Lötsch, Alfred Ultsch 
Persist Time Series Discretization  Matlab  GPL  Download  Fabian Mörchen 
Audio Feature Extraction  Matlab  GPL  Ingo Löhken, Michael Thies, Fabian Mörchen  
DWT/DFT time series feature extraction  Matlab  GPL  Download  Fabian Mörchen 
LaTeX/PDF Reports  Matlab  GPL  Download  Fabian Mörchen 
Spin3D  Java  GPL  Sourceforge Project
 Pascal Lehwark 
Generalized Umatrix
Projections from a highdimensional data space onto a twodimensional plane are used to detect structures, such as clusters, in multivariate data. The generalized Umatrix is able to visualize errors of these twodimensional scatter plots by using a 3D topographic map.
Ultsch, A., & Thrun, M. C.: Credible Visualizations for Planar Projections, in Cottrell, M. (Ed.), 12th International Workshop on SelfOrganizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), IEEE Xplore, France, 2017.
Thrun, M. C.: Projection Based Clustering through SelfOrganization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 9783658205393, https://doi.org/10.1007/9783658205409, 2018.
Databionic Swarm
Here a swarm system, called databionic swarm (DBS), is introduced which is able to adapt itself to structures of highdimensional data such as natural clusters characterized by distance and/or density based structures in the data space. The first module is the parameterfree projection method Pswarm, which exploits the concepts of selforganization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameterfree highdimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors based on the generalized Umatrix. The third module is the clustering method itself with noncritical parameters. The clustering can be verified by the visualization and vice versa. Thrun, M. C.: Projection Based Clustering through SelfOrganization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 9783658205393, DOI:10.1007/9783658205409, 2018.
Projection Based Clustering
Various visualizations of highdimensional data such as heat map and silhouette plot for grouped data, visualizations of the distribution of distances, the scatterdensity plot for two variables, the Shepard density plot and many more are presented here. Additionally, ‘DataVisualizations’ makes it possible to inspect the distribution of each feature of a dataset visually through the combination of four methods.Thrun, M.C., Ultsch, A.: Projection based Clustering, Conf. Int. Federation of Classification Societies (IFCS), DOI:10.13140/RG.2.2.13124.53124, Tokyo, 2017.
DataVisualizations
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91100), Berlin, Germany, Springer, 2005.
Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of SocioEconomic Phenomena, Vol. accepted, Foundation of the Cracow University of Economics, Zakopane, Poland, 2018.
Thrun, M. C.: Projection Based Clustering through SelfOrganization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/9783658205409), Doctoral dissertation, Heidelberg, Springer, ISBN: 9783658205393, 2018.
Umatrix
Interactives R Tool für ESOM Berechnung, U und Pmatrix Generierung, sowie U*matrix generierung und automatischer Inselausschneidung mit interactiver Clusterung. Demnächst auf CRAN, momentan schon vorab in der bethaVersion auf dieser Webseite. The following packages have to be installed/Imports: Rcpp, ggplot2, shiny, ABCanalysis, shinyjs, reshape2, fields, plyr, abind, tcltk, png, tools, grid, rgl
Thrun, M. C., Lerch, F., Lötsch, J., & Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers, Proc. of International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, 2016.
AdaptGauss
For a given data vector, the package provides a density estimate according to PDE [Ultsch 2005]. In an interactive tool, a Gaussian mixture model (GMM) can be generated manually or automatically (expectationmaximization algorithm) via the visualization of this density estimate. The GMM can be verified via a QQplot or a chisquare distribution test. Boundaries between the components of the GMM are calculated using Bayes’ theorem.
Ultsch, A., Thrun, M.C., HansenGoos, O., Lötsch, J.: Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox(AdaptGauss), International Journal of Molecular Sciences, doi:10.3390/ijms161025897, 2015.
Thrun M.C.,Ultsch, A., Models of Income Distributions for Knowledge Discovery, European Conference on Data Analysis, DOI 10.13140/RG.2.1.4463.0244, Colchester 2015.
ABC Analyse
For a given data set, the package provides a new method in the R programming language for calculating precise boundaries between subgroups that can be easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphing the cumulative distribution function. Based on an ABC analysis, the algorithm uses the ABC curve to calculate the optimal limits by exploiting the mathematical properties of the distribution of the analyzed elements. The data consist of positive values and are divided into three disjoint subsets A, B and C, where subset A, contains the very profitable values, i.e., largest data values (“the most important”) subset B, the values at which the profit equals the effort to obtain, and subset C, which contains of nonprofitable values, i.e., the smallest data sets (“the trivial”).
Ultsch, A., Lötsch, J.:Computed ABC analysis for rational selection of most informative variables in multivariate data, PLoS One, 2015.
Vademecum
As part of a student project work, the DataMining suite “Vademecum” was developed. It is a software that supports, guides and prevents the user from making mistakes during the knowledge discovery process. For all further information please visit the SourceForge Project. 

Databionic ESOM Tools
As part of a project group, we developed the Databionics ESOM Tools, a software package for training, visualization and interactive analysis of emergent selforganizing feature maps. The software is available under the GPL. For all further information please visit the SourceForge Project.  
Ultsch, A., Mörchen, F.: ESOMMaps: tools for clustering, visualization, and classification with Emergent SOM, Technical Report No. 46, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005) 
Databionic MusicMiner
In the context of a project group we developed the Databionic MusicMiner. It is a program that calculates the similarity of music pieces from the sound and displays a music collection as a map based on this. The software is available under the GPL. For all further information please visit the SourceForge Project.  
Mörchen, F., Ultsch, A., Thies, M., Löhken, I., Nöcker, M., Stamm, C., Efthymiou, N., Kümmerer, M.: MusicMiner: Visualizing timbre distances of music as topographical maps, Technical Report No. 47, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005) 
Time Series Knowledge Mining
Time Series Knowledge Mining (TSKM) is a methodology for finding understandable patterns in multivariate time series.  Download  
Mörchen, F.: Time Series Knowledge Mining, Phd thesis, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2006) 
Pareto Density Estimation
Die Pareto Density Estimation is eine Informationsoptimale Schätzung der empirischen WahrThe Pareto Density Estimation is an informationoptimal estimation of the empirical probability density. We provide an implementation for R in the AdaptGauss package.  
Ultsch, A.: Pareto density estimation:
A density estimation for knowledge discover, in Baier, D.;
Werrnecke, K. D., (Eds), Innovations in classification, data
science, and information systems, Proc Gfkl 2003, pp 91100,
Springer, Berlin, 2005. 
Persist Time Series Discretization
The Persist algorithm allows a discretization of time series into states of optimal duration. In contrast to conventional static histogram methods, the temporal sequence of values is used to optimize the bins. We provide an implementation for Matlab under the GPL. Download.  
Mörchen, F., Ultsch, A.: Optimizing Time Series Discretization for Knowledge Discovery, Grossman, R.L., Bayardo, R., Bennet, K., Vaidya, J. (Eds), In Proceedings The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, (2005), pp. 660665 
Audio Feature Extraction
The analysis of music data is often done on sound features calculated on short time windows. A wellknown example are the Mel Frequency Cepstral Coefficients (MFCC). Within the framework of a project group, we have developed flexible software for the computation of a large number of such sound features. We provide an implementation for Matlab under the GPL on request.  
Mörchen, F., Ultsch, A., Thies, M., Löhken, I.: Modelling timbre distance with temporal statistics from polyphonic music, IEEE Transactions on Speech and Audio Processing 14(1)IEEE, pp, 8190, 2006. 
DWT/DFT time series feature extraction
The best selection of coefficients from the Discrete Wavelet Transform (DWT) or the Discrete Fourier Transform (DFT) of time series in terms of energy conservation is in descending order of magnitude. For a set of time series such as those available for clustering or classification, this leads to poorly comparable representations, since different coefficients can be selected per time series. We have therefore proposed a global selection strategy that combines a comparable representation with good energy conservation. We provide an implementation for Matlab under the GPL: Download.  
Mörchen, F.: Time series feature extraction for data mining using DWT and DFT, Technical Report No. 33, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2003) 
LaTeX/PDF Reports
Spind3D
Spin3D – OpenGL Visualization Tool for high dimensional data. 