Axis aligned artifacts
There are artifacts created by choosing axis aligned cuts in robust random cut forests, similar to what was noted with IsoForest.. Left: Original data distribution. Right: Learned co-displacement, darker is lower. Notice the echoes around (10,-10) and (-10, 10) If instead of either of these, you use the depth in the robust random cut forest, you get what’s shown above. The first two examples are recreated by the code below:
State of Astro-informatics
I had a glance through “Realizing the potential of astrostatistics and astroinformatics” by Eadie et al. (2019). While I do not feel qualified or informed to comment on the suggestions, I can summarize them quickly. There are three problems: Education: Most astronomers are not trained in code development resulting in maybe good but fragile code. Similarly, most computer scientists don’t have the astronomy background or connections. Funding: Grants for methodology improvement are scarce.
Training an autoencoder with mostly noise
I am working on a project where we wish to use anomaly detection to find what image patches have structure and which don’t. As an aside, I ran an experiment on MNIST. You have 500 images of fives. You have 5000 images that are pure noise. You train a deep convolutional autoencoder. What you end up with is the following reconstruction: The top row are the inputs and the bottom row are the reconstructions.
I stumbled upon a game called Flood. It’s a simple enough game. You start with a grid of random colors. Then, you change the color of contiguous region formed from the upper left corner until you have flooded the entire grid with one color. I wrote some code and have been tinkering around some. The most naive solver is a breadth first search. So, I did that. Below you see the solution length for a grid size of varying size with only three colors.
Goal of Anomaly Detection in Non-stationary Data
I was explaining anomaly detection in non-stationary data to someone and threw together this crude example figure. The blue points are nominal and represent 90% of the points. The red are anomalous and represent 10% of the points. In this example, the red data is stationary while the blue passes through it. Thus, it would be very difficult to differentiate the red and blue points when they overlap. However, even if we only had a few frames of this video, we would like to be able to realize there are two dynamics going on.
The Value of a Peer-Reviewed Activity
This week, we have been talking about proof writing in the discrete mathematics course I’m teaching. Yesterday, I started class by having students answer how confident they are about their proof writing skills on a scale of 1 to 10, 1 being “clueless and not sure where to start”, 5 being “Okay and ready for homework,” and 10 being “I can do most any proof you throw at me with ease.
Atypicality Presentation Recap
Yesterday, I gave a presentation introducing the ideas of atypicality to the Monteleoni research group. These are the slides and handwritten notes. I plan to explore this idea further and write up better LaTeX notes, which I will then share as well. For now, the idea of atypicality centers around using two coders: one trained to perform best on typical data and one that is universal and not data specific. A sequence is atypical if its code length using the typical coder is longer than the universal coder, i.
Ulam–Warburton automaton inquiry
The Ulam-Warburton automaton is a simple growing pattern. See Wikipedia or this great Numberphile video for more information. For the more technical see this paper too. Ulam-Warburton animation from Wikipedia I was curious what you’d get under various other versions of it, using the same basic rule of “turn on cells with exactly one neighbor” but with a tweak. For example, what happens if you a cell turns off after being activated for a few cycles?
Training a denoising autoencoder with noisy data
How do you denoise images with an autoencoder if you don’t have a clean version to train with? One option is to add more noise to your images! In this experiment, I trained an autoencoder with noisy MNIST data. I began with MNIST images on the bottom row, the noiseless versions. To simulate observational data, I added Gaussian noise to the images. In reality, we may never have access to these noiseless images.
TSS versus f1-measure
The above movie shows how accuracy, TSS, and f1-measure change under the assumption that a classifier has no false positives until it has classified all of a class correctly. The vertical grey line shows the actual percentage of the features having a given class versus the horizontal axis what percentage of the class is identified by the model. For example, if the true class percentage is 0.1 as shown below we see that an aggressive classifier, one that prefers creating false positives, is punished much less by accuracy and TSS than by the f1-measure.
Generate Thematic Maps from Heliophysics Event Knowledgebase
The below script will allow you to generate thematic maps from the valuable Heliophysics Event Knowledgebase (HEK). I have written it to take a SUVI thematic map as input and output only Spatial Possibilistic Clustering Algorithm (SPoCA) coronal hole and bright region patches in HEK but would be willing to help assist others to modify the script as needed. Expert labeled map SPoCA map from HEK The script can be found below and bundled with smachy, my solar image segmentation toolkit.