Here you will find software created by the AG Datenbionik for scientific purposes, which we make publicly available. Please cite the corresponding publications when using them.
|DataIO||R||GPL||Github||Alfred Ultsch, Florian Lerch, Michael Thrun, Catharina Lippman, Felix Pape, Onno Hansen-Goos, Sabine Herda|
|DataVisualizations||R||GPL||CRAN||Michael Thrun, Felix Pape, Onno Hansen-Goos, Fredericke Matz, Alfred Ultsch|
|ProjectionBasedClustering||R||GPL||CRAN||Michael Thrun, Florian Lerch, Felix Pape, Kristian Nybo, Jarkko Venna|
|GeneralizedUmatrix||R||GPL||CRAN||Michael Thrun, Alfred Ultsch|
|Umatrix||R||GPL||Download, Manual, |
|Florian Lerch, Michael Thrun, Alfred Ultsch|
|AdaptGauss: Gaussian Mixture Models (GMM)||R||GPL||CRAN||Michael Thrun, Onno Hansen-Goos, Rabea Griese, Catharina Lippmann, Florian Lerch, Jörn Lötsch, Alfred Ultsch|
|ABCanalysis||R||GPL||CRAN, Online||Michael Thrun, Florian Lerch, Jörn Lötsch, Alfred Ultsch|
|Vademecum||Java||GPL||Sourceforge Project||Torben Rühl, Steffen Springer, Burcu Dalmis, Jan Kohlhof, Dirk Schäfer|
|Databionic ESOM Tools||Java||GPL||SourceForge Project||Christan Stamm, Mario Nöcker, Fabian Mörchen, u.v.a.|
|Databionic MusicMiner||Java||GPL||SourceForge Project||Mario Nöcker, Christan Stamm, Fabian Mörchen, Niko Efthymiou, Michael Thies, Ingo Löhken, u.v.a.|
|Time Series Knowledge Mining||Matlab||GPL||Download||Fabian Mörchen|
|Pareto Density Estimation||R||GPL||CRAN||Michael Thrun, Onno Hansen-Goos, Rabea Griese, Catharina Lippmann, Jörn Lötsch, Alfred Ultsch|
|Persist Time Series Discretization||Matlab||GPL||Download||Fabian Mörchen|
|Audio Feature Extraction||Matlab||GPL||Ingo Löhken, Michael Thies, Fabian Mörchen|
|DWT/DFT time series feature extraction||Matlab||GPL||Download||Fabian Mörchen|
|LaTeX/PDF Reports||Matlab||GPL||Download||Fabian Mörchen|
Projections from a high-dimensional data space onto a two-dimensional plane are used to detect structures, such as clusters, in multivariate data. The generalized Umatrix is able to visualize errors of these two-dimensional scatter plots by using a 3D topographic map.
Ultsch, A., & Thrun, M. C.: Credible Visualizations for Planar Projections, in Cottrell, M. (Ed.), 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), IEEE Xplore, France, 2017.
Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018.
Here a swarm system, called databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data such as natural clusters characterized by distance and/or density based structures in the data space. The first module is the parameter-free projection method Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors based on the generalized U-matrix. The third module is the clustering method itself with non-critical parameters. The clustering can be verified by the visualization and vice versa. Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, DOI:10.1007/978-3-658-20540-9, 2018.
Projection Based Clustering
Various visualizations of high-dimensional data such as heat map and silhouette plot for grouped data, visualizations of the distribution of distances, the scatter-density plot for two variables, the Shepard density plot and many more are presented here. Additionally, ‘DataVisualizations’ makes it possible to inspect the distribution of each feature of a dataset visually through the combination of four methods.Thrun, M.C., Ultsch, A.: Projection based Clustering, Conf. Int. Federation of Classification Societies (IFCS), DOI:10.13140/RG.2.2.13124.53124, Tokyo, 2017.
Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.
Thrun, M. C., & Ultsch, A.: Effects of the payout system of income taxes to municipalities in Germany, 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, Vol. accepted, Foundation of the Cracow University of Economics, Zakopane, Poland, 2018.
Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.
Interactives R Tool für ESOM Berechnung, U und Pmatrix Generierung, sowie U*matrix generierung und automatischer Inselausschneidung mit interactiver Clusterung. Demnächst auf CRAN, momentan schon vorab in der betha-Version auf dieser Webseite. The following packages have to be installed/Imports: Rcpp, ggplot2, shiny, ABCanalysis, shinyjs, reshape2, fields, plyr, abind, tcltk, png, tools, grid, rgl
Thrun, M. C., Lerch, F., Lötsch, J., & Ultsch, A.: Visualization and 3D Printing of Multivariate Data of Biomarkers, Proc. of International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, 2016.
For a given data vector, the package provides a density estimate according to PDE [Ultsch 2005]. In an interactive tool, a Gaussian mixture model (GMM) can be generated manually or automatically (expectation-maximization algorithm) via the visualization of this density estimate. The GMM can be verified via a QQplot or a chi-square distribution test. Boundaries between the components of the GMM are calculated using Bayes’ theorem.
Ultsch, A., Thrun, M.C., Hansen-Goos, O., Lötsch, J.: Identification of Molecular Fingerprints in Human Heat Pain Thresholds by Use of an Interactive Mixture Model R Toolbox(AdaptGauss), International Journal of Molecular Sciences, doi:10.3390/ijms161025897, 2015.
Thrun M.C.,Ultsch, A., Models of Income Distributions for Knowledge Discovery, European Conference on Data Analysis, DOI 10.13140/RG.2.1.4463.0244, Colchester 2015.
For a given data set, the package provides a new method in the R programming language for calculating precise boundaries between subgroups that can be easily interpreted. Closely related to the Lorenz curve, the ABC curve visualizes the data by graphing the cumulative distribution function. Based on an ABC analysis, the algorithm uses the ABC curve to calculate the optimal limits by exploiting the mathematical properties of the distribution of the analyzed elements. The data consist of positive values and are divided into three disjoint subsets A, B and C, where subset A, contains the very profitable values, i.e., largest data values (“the most important”) subset B, the values at which the profit equals the effort to obtain, and subset C, which contains of non-profitable values, i.e., the smallest data sets (“the trivial”).
Ultsch, A., Lötsch, J.:Computed ABC analysis for rational selection of most informative variables in multivariate data, PLoS One, 2015.
|As part of a student project work, the DataMining suite “Vademecum” was developed. It is a software that supports, guides and prevents the user from making mistakes during the knowledge discovery process. For all further information please visit the SourceForge Project.||
Databionic ESOM Tools
|As part of a project group, we developed the Databionics ESOM Tools, a software package for training, visualization and interactive analysis of emergent self-organizing feature maps. The software is available under the GPL. For all further information please visit the SourceForge Project.|
|Ultsch, A., Mörchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM, Technical Report No. 46, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005)|
|In the context of a project group we developed the Databionic MusicMiner. It is a program that calculates the similarity of music pieces from the sound and displays a music collection as a map based on this. The software is available under the GPL. For all further information please visit the SourceForge Project.|
|Mörchen, F., Ultsch, A., Thies, M., Löhken, I., Nöcker, M., Stamm, C., Efthymiou, N., Kümmerer, M.: MusicMiner: Visualizing timbre distances of music as topographical maps, Technical Report No. 47, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2005)|
Time Series Knowledge Mining
|Time Series Knowledge Mining (TSKM) is a methodology for finding understandable patterns in multivariate time series.||Download|
|Mörchen, F.: Time Series Knowledge Mining, Phd thesis, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2006)|
Pareto Density Estimation
|Die Pareto Density Estimation is eine Informations-optimale Schätzung der empirischen WahrThe Pareto Density Estimation is an information-optimal estimation of the empirical probability density. We provide an implementation for R in the AdaptGauss package.|
|Ultsch, A.: Pareto density estimation:
A density estimation for knowledge discover, in Baier, D.;
Werrnecke, K. D., (Eds), Innovations in classification, data
science, and information systems, Proc Gfkl 2003, pp 91-100,
Springer, Berlin, 2005.|
Persist Time Series Discretization
|The Persist algorithm allows a discretization of time series into states of optimal duration. In contrast to conventional static histogram methods, the temporal sequence of values is used to optimize the bins. We provide an implementation for Matlab under the GPL. Download.|
|Mörchen, F., Ultsch, A.: Optimizing Time Series Discretization for Knowledge Discovery, Grossman, R.L., Bayardo, R., Bennet, K., Vaidya, J. (Eds), In Proceedings The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, (2005), pp. 660-665|
Audio Feature Extraction
|The analysis of music data is often done on sound features calculated on short time windows. A well-known example are the Mel Frequency Cepstral Coefficients (MFCC). Within the framework of a project group, we have developed flexible software for the computation of a large number of such sound features. We provide an implementation for Matlab under the GPL on request.|
|Mörchen, F., Ultsch, A., Thies, M., Löhken, I.: Modelling timbre distance with temporal statistics from polyphonic music, IEEE Transactions on Speech and Audio Processing 14(1)IEEE, pp, 81-90, 2006.|
DWT/DFT time series feature extraction
|The best selection of coefficients from the Discrete Wavelet Transform (DWT) or the Discrete Fourier Transform (DFT) of time series in terms of energy conservation is in descending order of magnitude. For a set of time series such as those available for clustering or classification, this leads to poorly comparable representations, since different coefficients can be selected per time series. We have therefore proposed a global selection strategy that combines a comparable representation with good energy conservation. We provide an implementation for Matlab under the GPL: Download.|
|Mörchen, F.: Time series feature extraction for data mining using DWT and DFT, Technical Report No. 33, Dept. of Mathematics and Computer Science, University of Marburg, Germany, (2003)|
|Spin3D – OpenGL Visualization Tool for high dimensional data.|