Computational science training: 2010 projects
This page present projects to be run by Peter Saveliev for the grant "REU Site: Computational Science Training at Marshall University for Undergraduates in the Mathematical and Physical Sciences" (Principle Investigator: Howard L. Richards). Over the summers of 2010–2012, the Departments of Mathematics, Physics, and Chemistry at Marshall University will jointly host twelve students for ten weeks of instruction and research in computational science.
In the beginning of summer I suggested 3 potential projects, below. These are the two that I ended up supervising:
The topic of the REU was computational science so the general goal was to do something new computationally while learning some math. So, we used and modified some existing software. The main idea of the first project is to use JPlex to compute dimensions of datasets while the main idea of the second was to use CHomP to compute the homology classes of 3D images along with their persistence. Ironically, jPlex computes persistent homology but lacks relative homology which had to be implemented; while CHomP has relative homology but the persistence had to be implemented. These issues were mostly resolved. Unfortunately, in either case there wasn't enough time to test the programs with real-life data.
Digital image analysis
Image analysis and computer vision is the extraction of meaningful information from digital images. Some of the most prominent application is in cell analysis, medical image processing, and industrial machine vision.
There exists an abundance of methods for solving various well-defined computer vision tasks, where the methods are very task specific and seldom can be reused in a wide range of applications. Our long term goal is to design a computer vision system “from first principles”. These principles will come initially from algebraic topology.
Algebraic topology is a well established discipline within mathematics. Its main computational tools have been implemented recently as software. However, this theory and these tools are only applicable to binary images. A framework for analysis of gray scale images has been developed, but this is just the first step.
Short term projects:
- developing protocols for using the framework to specific tasks, software: Pixcavator (Windows):
- developing gray scale analysis of 3D images, software: CHomP (C++).
Long term projects:
- developing new methods that resolve the ambiguity of the boundaries of objects in gray scale images;
- integration of the other computer vision methods into the framework;
- expanding the framework to video (first binary, then gray scale, etc);
- expanding the framework to color images (and other multichannel images);
- application to stereo vision.
Current project: 3D image analysis.
It is the exact opposite of the text-to-image search we are familiar with. Given an image, visual image search engines find image in a collection that are similar, in some way, to the query image.
So far, these engines exist mostly as experimental prototypes. Most of these demo programs work with small collections of images and, frequently, without an upload feature, which makes testing impossible. Meanwhile, when testing is possible, the results are questionable.
The approach is based on the methods related to the digital image analysis project. The distribution of the sizes of the objects in the image is compared to those of other images.
Short term projects, software PxSearch (Windows):
- creating datasets for various, medium-size image collections;
- developing a comprehensive review of the literature on the subject;
- evaluating the quality of the matching;
- modifying the matching criteria (bins for the distributions, thresholds for noise, etc);
- analyzing the topology of the datasets (project below).
Long term projects:
- creating datasets for large-size image collections;
- matching images based on the complete information about their topologies (project above);
- developing search based on partial matching.
Topological data analysis
Suppose we have conducted 1000 experiments with a set of 100 various measurements in each. Then each experiment is a string of 100 numbers, or simply a vector of dimension 100. The result is a collection of disconnected 1000 points, called the point cloud, in a 100-dimensional vector space.
It is impossible to visualize this data as any representation that one can see is limited to dimension 3. Yet we still need to answer a few simple topological questions about the object behind the point cloud:
- Is it one piece or more?
- Is there a tunnel?
- Or a void?
- And what about possible 100-dimensional topological features?
Through clustering (and related approaches) statistics answers the first question. This is a common topological approach to the problem. For a point cloud in a euclidean space, suppose we are given a threshold r so that any two points within r from each other are to be considered "close". Then each pair of such points is connected by an edge. If three points are “close”, we add a face, etc. The result is a simplicial complex that approximates the manifold M behind the point cloud. More: Topological data analysis.
Short term projects, software jPlex (Java):
- applying jPlex to various datasets,
- applying jPlex to the dataset from the image-to-image search project
- local analysis and dimensionality reduction
Long term projects:
- multiple parameter analysis
Current project: The topology of data