Posts

Comparison of speed of np.sum in Cython

I was curious how much the overhead of np.sum impacted Cython code. That is, should you write your own sum method that loops or use np.sum? So, I wrote up a little test definition of the two approaches, as shown below: import numpy as np cimport numpy as np cpdef double sum_np(np.ndarray[double] a): return np.sum(a) cpdef double sum_loop(double[:] a) nogil: cdef size_t i, I cdef double total = 0.0 I = a.

Tuesday, May 11, 2021 Read

Truncate to specified number of significant figures

It’s a simple request. You have a number, e.g. 1.25, and want to truncate it to say 2 significant figures. You specifically need to truncate instead of rounding, i.e. you need to get 1.2 instead of 1.3. How do you do that quickly in Python? Check out this code snippet: from math import log10, floor import decimal decimal.getcontext().rounding = decimal.ROUND_DOWN def truncate_sig(x, sig=2): x = float(x) # make sure it's a float or decimal throws an error if x == 0: # can't take log10(0) so just return 0 return 0 elif x <= 0: # if it's negative, determine sigfigs of positive and multiply by -1 return -1 * truncate_sig(np.

Monday, May 10, 2021 Read

Paper published: Real-time solar image classification: Assessing spectral, pixel-based approaches

This was my first first-author paper! It talks about doing semantic segmentation on the Sun. Here’s the link.

Saturday, November 16, 2019 Read

Axis aligned artifacts

There are artifacts created by choosing axis aligned cuts in robust random cut forests, similar to what was noted with IsoForest.. Left: Original data distribution. Right: Learned co-displacement, darker is lower. Notice the echoes around (10,-10) and (-10, 10) If instead of either of these, you use the depth in the robust random cut forest, you get what’s shown above. The first two examples are recreated by the code below:

Monday, October 7, 2019 Read

State of Astro-informatics

I had a glance through “Realizing the potential of astrostatistics and astroinformatics” by Eadie et al. (2019). While I do not feel qualified or informed to comment on the suggestions, I can summarize them quickly. There are three problems: Education: Most astronomers are not trained in code development resulting in maybe good but fragile code. Similarly, most computer scientists don’t have the astronomy background or connections. Funding: Grants for methodology improvement are scarce.

Friday, October 4, 2019 Read

Training an autoencoder with mostly noise

I am working on a project where we wish to use anomaly detection to find what image patches have structure and which don’t. As an aside, I ran an experiment on MNIST. You have 500 images of fives. You have 5000 images that are pure noise. You train a deep convolutional autoencoder. What you end up with is the following reconstruction: The top row are the inputs and the bottom row are the reconstructions.

Tuesday, September 17, 2019 Read

Flood

I stumbled upon a game called Flood. It’s a simple enough game. You start with a grid of random colors. Then, you change the color of contiguous region formed from the upper left corner until you have flooded the entire grid with one color. I wrote some code and have been tinkering around some. The most naive solver is a breadth first search. So, I did that. Below you see the solution length for a grid size of varying size with only three colors.

Monday, September 16, 2019 Read

Goal of Anomaly Detection in Non-stationary Data

I was explaining anomaly detection in non-stationary data to someone and threw together this crude example figure. The blue points are nominal and represent 90% of the points. The red are anomalous and represent 10% of the points. In this example, the red data is stationary while the blue passes through it. Thus, it would be very difficult to differentiate the red and blue points when they overlap. However, even if we only had a few frames of this video, we would like to be able to realize there are two dynamics going on.

Monday, September 9, 2019 Read

The Value of a Peer-Reviewed Activity

This week, we have been talking about proof writing in the discrete mathematics course I’m teaching. Yesterday, I started class by having students answer how confident they are about their proof writing skills on a scale of 1 to 10, 1 being “clueless and not sure where to start”, 5 being “Okay and ready for homework,” and 10 being “I can do most any proof you throw at me with ease.

Friday, June 14, 2019 Read

Atypicality Presentation Recap

Yesterday, I gave a presentation introducing the ideas of atypicality to the Monteleoni research group. These are the slides and handwritten notes. I plan to explore this idea further and write up better LaTeX notes, which I will then share as well. For now, the idea of atypicality centers around using two coders: one trained to perform best on typical data and one that is universal and not data specific. A sequence is atypical if its code length using the typical coder is longer than the universal coder, i.

Friday, May 3, 2019 Read

Ulam–Warburton automaton inquiry

The Ulam-Warburton automaton is a simple growing pattern. See Wikipedia or this great Numberphile video for more information. For the more technical see this paper too. Ulam-Warburton animation from Wikipedia I was curious what you’d get under various other versions of it, using the same basic rule of “turn on cells with exactly one neighbor” but with a tweak. For example, what happens if you a cell turns off after being activated for a few cycles?

Wednesday, April 24, 2019 Read

Training a denoising autoencoder with noisy data

How do you denoise images with an autoencoder if you don’t have a clean version to train with? One option is to add more noise to your images! In this experiment, I trained an autoencoder with noisy MNIST data. I began with MNIST images on the bottom row, the noiseless versions. To simulate observational data, I added Gaussian noise to the images. In reality, we may never have access to these noiseless images.

Wednesday, March 27, 2019 Read