Software: TINTO - Converting Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks

TINTO Logo

Abstract

TINTO is an open-source, user-extendable framework that offers new opportunities for users to convert tidy data into images through the representation of characteristic pixels. For this transformation, TINTO implemented two-dimensional reduction algorithms, such as PCA and t-SNE. Our proposal also includes a technique used in painting known as blurring, which adds more ordered information to the image and can improve the classification task in CNNs.

Citing TINTO: If you used TINTO in your work, please cite the INFFUS Paper:

@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}

And the SoftwareX paper

@article{softwarex_TINTO,
    title = {TINTO: Converting Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks},
    journal = {SoftwareX},
    author = {Manuel Castillo-Cara and Reewos Talla-Chumpitaz and Raúl García-Castro and Luis Orozco-Barbosa},
    year = {2023},
    issn = {2352-7110},
    volume = {22},
    pages = {101391},
    doi = {https://doi.org/10.1016/j.softx.2023.1013911}
}

Documentation

You can find all the documentation and sources of TINTO in OEG GitHub.

Video Example

Main Features

  • Supports all CSV data in Tidy Data format.
  • For now, the algorithm converts tabular data for binary and multi-class classification problems into machine learning.
  • Input data formats:
    • Tabular files: The input data must be in CSV, taking into account the Tidy Data format.
    • Tidy Data: The target (variable to be predicted) should be set as the last column of the dataset. Therefore, the first columns will be the features.
    • All data must be in numerical form. TINTO does not accept data in string or any other non-numeric format.
  • Two dimensionality reduction algorithms are used in image creation, PCA and t-SNE from the Scikit-learn Python library.
  • The synthetic images to be created will be in black and white, i.e. in 1 channel.
  • The synthetic image dimensions can be set as a parameter when creating them.
  • The synthetic images can be created using characteristic pixels or blurring painting technique (expressing an overlap of pixels as the maximum or average).
  • Runs on Linux, Windows and macOS systems.
  • Compatible with Python 3.7 or higher.

Input

The following table shows a classic example of the IRIS CSV dataset as it should look like for the run:

sepal lengthsepal widthpetal lengthpetal widthtarget
4.93.01.40.21
7.03.24.71.42
6.33.36.02.53

Output

The following Figure show the output of TINTO: