Hierarchical Data File 5 is a new (1998) format capable of storing large and complex amount of data, and it is used in Gravitational and Plasma Physics, Earth Science research, Weather Services, Software Engineering, Biomedical Informatics, etc. As new data adquisition hardware is providing bigger datasets (for example, sequencing data) the need to query and access metadata, partial and full datasets in an efficient way (parallel I/O) is more important. In this format data are stored in a hierarchical format similar to the UNIX file system, and the data model supports a rich variety of data types and data space organizations. Currently exists APIs and wrappers for Java, .NET, Python, C and FORTRAN.
The goal of this project is to build a wrapper to enable to access HDF5 data in Smalltalk. This binding could open Smalltalk to a lot of science domains and users in which currently pure object technology is unknown.
The student will need to learn details about the HDF format as data sets and composite data types.
Benefits to the Student
The student would learn about efficient data systems, implement an API, and experiment with large scientific data in Smalltalk.
Benefits to the Community
The Smalltalk community will attract more users by keeping in touch with big data analytics, by providing access to an efficient data format used currently in research and business.