Methodology

Framing Question
How is knowledge constructed in science and how is its validity assessed?

To answer this question, I will look at primary and secondary sources of historical and current meteorology and climate science.
I will also look at a specific example of current cloud classification research through my focus question

Focus Question

Where does our understanding of cloud classification come from and how accurate is it?

I will develop an automated cloud classifier from TSI images and evaluate how accurately the algorithm classifies cloud type. This research will attempt to answer the second part of the focus question because I will be testing the validity of the results. I will also analyze the tools and algorithms I am using to see their origin and network in how they contribute to cloud classification.
I will then tie this methodology together by examining the production process of automated cloud classification. I will ask myself how I am producing knowledge and consider how/if this process exhibits hybridity in developing understanding and distanciation. I will also identify any motivating forces, values, and attitudes.

This methodology will place current cloud research into a much larger temporal and spacial picture of meteorology and climate science.

Below is a simplified sub-methodology to see whether an automated classifier can replicated human observations.

The TSI images must be preprocessed to remove anything that is not sky. This is something that we, as human observers do with out thinking about. The center of each image must be identified and well as the location of the sun. The parts of the instrument and horizon visible in the image also must be removed. Following the improvements in Long 2008, we added options to include a sun circle and horizon area mask.
Discontinuities in the data set must be accounted for. The data switched from 352 by 288 pixels to 640 by 480 pixels in 2011 so we have to process these sets of images differently. The naming convention changed over time, also complicating the image processing. As discussed before, the center of the images may shift over time due to changes in the camera rig. In the centroid lookup table we also included a list of dates that do not exist in the data because of instrument maintenance or are otherwise compromised by dirt/poop/birds.
Images can be processed using statistical features. Following Calbo and Sabburg (2008), the images are converted to the red to blue ratio, which makes it easier to capture cloud information. I calculated the Mean (ME) Standard Deviation (SD), Third Moment (TM), Uniformity (UF), Entropy (EY), and Smoothness (SM) from the R/B image histogram. Using Matlab’s built in image gradient function, I also used the median magnitude of the gradient (MG).
Image features can also be made using the thresholded image, pulling out cloud mass and gap information.
A test set of images is classified by a team of human observers, which is then whittled down to only include the images where the checkers agreed on the classification.
Then, two automated classifiers compare the set of images to those that were manually classified. Based roughly on the difference between the image features, the images are automatically classified.
We then assess the performance of the classifiers by comparing the results to the manually classified images.