October 11

Lab 8: Data Analysis

Print Friendly, PDF & Email

10-12-18

Marci Jordan

Bio Lab 1105-31

Pre-Lab and Lecture

Learning Theory: Fail to Succeed

Dr. Adair gave us a short podcast to listen to called “Why Science Needs Failure to Succeed” by Science Fridays. This podcast was very interesting as the host and two guest, Stuart Firestein and Helen Snodgrass made comments on the subjects. Some messages I liked and took away are as listed. Success in science gets all the attention. People are awarded the noble peace prize for their discoveries but without failure those discoveries never would of been found. To be a scientist you must have the stomach for failure but also love the taste of failure. Failures lead scientists down the correct path as it takes them to new places in which they find new discoveries. An example of this is when the G-Protein receptor was discovered. The experiment kept failing because of the washing techniques of the supplies. There was aluminum fluoride in the washing solution which disrupted the experiment but lead to the discovery that trace metals are activators of enzymes. In school, they only teach about the discoveries. But those discoveries that everyone learns about all came from failures. In Helen Snodgrass’s classroom she have “Failure is not an option, it is a requirement” posted on wall. I really enjoyed this podcast because it not only explained that its okay to fail in science, but also emphasized that failure is necessary to be successful.

Bioskills:

Data Analysis

Following tutorials/data is not using our class lab data. In statistical analysis we will use four tools on the Excel Toolpak. These tools include Descriptive Statistics, Histogram, The F-Test Two-Sample for Variances analysis, and The Two-Sample t-Test analysis tool. Each of these tools have different purposes so I will list each purpose of the four below. The Descriptive Statistics analysis tool generates a report of univariate statistics for data in the input range, providing information about the central tendency and variability of your data. The Histogram analysis tool calculates individual and cumulative frequencies for a cell range of data and data bins. This tool generates data for the number of occurrences of a value in a data set. We can use it to generate a graph to test for. The F-Test Two-Sample for Variances analysis tool performs a two-sample F-test to compare two population variances. This result will be used to choose the type of t-Test. Lastly, The Two-Sample t-Test analysis toolstest for equality of the population means that underlie each sample. The three tools employ different assumptions: that the population variances are equal, that the population variances are not equal, and that the two samples represent before-treatment and after-treatment observations on the same subjects.

Data analysis is simply the conversion of raw data into meaningful information. Remember that statistical analysis indicates correlation, not causality. The experimental design can contribute to the ability to draw inferences from an experiment.  For example, a controlled experiment that has only 1 variable, is easier to interpret than a comparison of cell counts from a variety of natural environments. Below are nine definitions of common used terms in data analysis. A proportion is a fraction – for example: out of the 100 organisms seen in the soil, there were 3 ciliates (3/100). A percentage is another way to express a proportion by multiplying it by 100% (the proportion above would be .03 x 100% = 3%). The mean is the average of a set of data, and can be obtained by dividing the sum of the set by the number of data points. The mean is sensitive to extreme values – when there are outliers in a set of data, the mean is less helpful as an analysis tool. The median is the middle of a set of data, and can be found by arranging the data in numerical order and finding the middle-most data point. The median is less sensitive to extreme values, and can be useful despite the presence of outliers. The range of a data set is the difference between the extreme measurements (smallest & largest) of the sample. The Standard Deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values. Basically- how spread out are the values? Variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. Charts and graphs are used in data analysis to display trends, relationships, and comparisons. Different types of graphs are used for displaying different types of information. For instance, a line graph displays trends over time, a bar chart allows comparisons between categories of data, and a pie chart shows proportional share. All charts and graphs should be clearly labeled and should enhance the viewer’s understanding of the topic.  Tables are an excellent alternative to charts and graphs. While it is easier to detect patterns in graphs, tables allow you to display much larger quantities of data.

Statistical Analysis Overview and Descriptive Analysis

A screen capture performed by Dr. Adair showed the steps to get the statistical analysis data. First, Open the excel file generated in class. The click the data table and on the right side click on “Data Analysis”. On the box that pops up by scrolling down you can see the four tools we will be using in lab which I talked about individually above.

Another screen capture performed by Dr. Adair showed the steps to get the descriptive analysis data. She performed this using the cell count data from class. First she opened up a new sheet on excel(on the bottom). Next, she copied and pasted different cell counts from different sections. Then, click the data tab, click on the data analysis button, and select descriptive statsitics. Lastly, she selected the input range and output range. Also, be sure to check the “summary statistic” box as it will give you all the information you need. Once the new data is generated you can compare the results side by side.

The Null Hypothesis, P-Values, and Statistical Testing

Nothing in science is proven until it is disproven. In other words, we cannot prove that a certain chemical causes Tetrahymena to grow or not, because this proof requires that we have thought of, successfully tested, and have the results of every possible observations concerning this hypothesis. Instead, we can state the negative and say the chemical will have no effect on the growth of Tetrahymena. If we detect that there is a difference in growth then we can reject the null hypothesis. This falsification of the null hypothesis supports the idea that there is an effect caused by the chemical. If this is the result we have reason to investigate the effect further. If we do not detect a difference in growth, than we accept the null hypothesis and conclude there was no effect.  We did not prove there was no effect; we collected evidence that supported our null hypothesis.

There are many statistical tests and rules concerning hypothesis testing. In most cases, these tests ask in a mathematical manner how probable are the data if the null hypothesis is true. The probability is reported as a P-value. The smaller the P-value, the stronger the statistical evidence against the null hypothesis. A common critical value that a test statistic must exceed in order to accept the null hypothesis is 0.05.

So, if our null hypothesis predicts that the difference between the control cell counts and the treatment cell counts is 0 (there is no effect caused by our treatment) and we compare the values using a statistical test (such as the Students T-test) and calculate a P-value of 0.001, then we would reject the null hypothesis and support the idea that the treatment did have an effect. It does not prove that the null hypothesis is false; it just indicates that it is unlikely to be true.

The T-Test

In our experiment, the effect of a substance was measured through measuring cell counts. Our null hypothesis states that there was no difference between the control and the treatment at a certain time period. In order to determine if the mean averages are the same or different, you need to use statistical tests. One of the most common ways to test the difference between 2 means is called the Students T-test. This test can be found in the list of statistical tests in Excel (T.TEST) and is listed in the Data Analysis tools along with the other tools we will be using.

This test has some assumptions which can themselves be tested using a histogram and an F test, also found in the Data Analysis tools.

  1. The individual observations are independent of each other. (The occurrence of one does not affect the probability of occurrence of other)- TRUE
  2. The distribution of observations are continuous (not categorical) TRUE
  3. The observations are normally distributed. Make a Histogram to test
  4. And the variances of the control sample and treatment samples have to be considered. Use the F test.

Once you have determined if you have equal or unequal variances, run the appropriate T-test.

Purpose

The purpose of this lab was to run statistics of our results. Statistics involves estimations, inferences, and study designs which help us see patterns. Students will not only get to assess statistics but actually compare their results and find what they mean. The statistics might not show anything exciting but it puts it in a format to compare which is the purpose.

Objective

The objective this week was for students to perform a series of statistic processes. These statistic processes will be done with the combination of everyone’s data from the section from last week. Students will properly perform a descriptive statistics, histograms, F test, and a T test. Each of these tools will be used twice; once for the cell counts (control and treatment) and twice for the assay which they performed (either cell speed, directional change, or vacuole formation).

Hypothesis

I think that students will be very intrigued during this lab. It was a lot of fun to finally perform the experiment and get data; but now we can compare and see what the results mean. However, I think that students who did not complete the pre-lab will have difficulty as the pre-lab allowed students to use these tools before class.

Procedures

Descriptive Statistics

  1. Perform descriptive statistics of cell counts
  2. Repeat; once for the control, second for treatment
  3. Perform descriptive statistics for vacuole formation
  4. Repeat; once for the control, second for treatment

Histogram

  1. First, make a column of data and a column of bin numbers
  2. Bin numbers are the X point values you want on your graph
  3. Create four histograms (cell count control, cell count treatment, vacuole control, and vacuole treatment)

F-Test

  1. Use a F-Test to determine the variance
  2. First, run a F-Test for cell counts
  3. Second, run a F-Test for vacuole formations
  4. Then look at the values of variance to determine which T-test to use

T-Test

  1. Use the appropriate T-test based on variance
  2. First, run a T-test for cell counts
  3. Second, run a T-test for vacuole formations

Once all these tools have been utilized and you have all the tables and graphs, transfer the final answers onto a word document.

Data

 

Cell Count Descriptive Statistics:

control
 Mean 21000
Standard Error 1585.887005
Median 19000
Mode 5500
Standard Deviation 11973.18432
Sample Variance 143357142.9
Kurtosis -0.618398756
Skewness 0.508417605
Range 44500
Minimum 4500
Maximum 49000
Sum 1197000
Count 57
Treatment
Mean 56388.88889
Standard Error 5703.139294
Median 42750
Mode 22000
Standard Deviation 41909.34361
Sample Variance 1756393082
Kurtosis 5.977648617
Skewness 2.101090224
Range 221000
Minimum 11000
Maximum 232000
Sum 3045000
Count 54


 

Vacuole Formation Descriptive Statistics (5 mins)

Control:
Mean 2.8
Standard Error 0.232599579
Median 3
Mode 2
Standard Deviation 1.471088904
Sample Variance 2.164102564
Kurtosis 2.972472706
Skewness 0.974634507
Range 8
Minimum 0
Maximum 8
Sum 112
Count 40


 

Treatment:
Mean 3.7
Standard Error 0.423643346
Median 3
Mode 3
Standard Deviation 1.894590638
Sample Variance 3.589473684
Kurtosis -0.071350149
Skewness 0.842033042
Range 7
Minimum 1
Maximum 8
Sum 74
Count 20

 

Histograms for Cell Counts

Bin Number for control Frequency
4500 1
13375 19
22250 13
35625 15
49000 9
More 0


 

 

Bin number for treatment Frequency
11000 1
66250 33
121500 17
176750 1
232000 2
More 0

 

Histograms for Vacuole Formations

Bin number for Control Frequency
0 2
2 16
4 18
6 3
8 1
More 0

 

 

Bin number for Treatment Frequency
0 0
2 6
4 8
6 4
8 2
More 0

 

 

F-Tests for Cell Counts

F-Test Two-Sample for Variances
  control Treatment
Mean 21000 56388.88889
Variance 143357142.9 1756393082
Observations 57 54
df 56 53
F 0.081620193
P(F<=f) one-tail 0
F Critical one-tail 0.638846439

 

F-Tests for Vacuole Formation

F-Test Two-Sample for Variances
  Control: Treatment:
Mean 2.8 3.7
Variance 2.164102564 3.589473684
Observations 40 20
df 39 19
F 0.602902474
P(F<=f) one-tail 0.089427538
F Critical one-tail 0.537657403

 

T- Tests for Cell Counts

T-Test: Two-Sample Assuming Unequal Variances
  control Treatment
Mean 21000 56388.88889
Variance 143357142.9 1756393082
Observations 57 54
Hypothesized Mean Difference 0
df 61
t Stat -5.978327485
P(T<=t) one-tail 6.3552E-08
t Critical one-tail 1.670219484
P(T<=t) two-tail 1.27104E-07
t Critical two-tail 1.999623585

 

T- Tests for Vacuole Formation

T-Test: Two-Sample Assuming Unequal Variances
  Control: Treatment:
Mean 2.8 3.7
Variance 2.164102564 3.589473684
Observations 40 20
Hypothesized Mean Difference 0
df 31
t Stat -1.862207923
P(T<=t) one-tail 0.036038207
t Critical one-tail 1.695518783
P(T<=t) two-tail 0.072076413
t Critical two-tail 2.039513446

 

Conclusion

In conclusion this lab was a lot of fun. It was very convenient using excel to do all the work and getting the statistics ran right away. From the results my conclusion is that Tetrahymena are affected by polypropylene. I say this because seen in the results above the data is higher for the treatment. For cell counts, there was much more cells in the treatment then there was in the control. In the vacuole formation assay there were more vacuoles formed in the treatment than in control. This was not as huge of an increase as the cell counts were but it was much more. I’m not sure what will come next but from this weeks lab I can confidently say that polypropylene makes cells reproduce more and form more vacuoles.

Next Step

With this new information I am looking forward to next weeks lab, whats next? Now that the research has begun I am thoroughly enjoying it. I have this data and I can see that the polypropylene affected the Tetrahymena. I will take these statistics and apply them to other research that has been done. Can my results possibly support someone else’s findings? Are my results popular with other groups in my section? I know my results are that it changed the cell count and vacuole formation but is it enough to reject the null hypothesis? My next step is to find answers to these questions as I continue research and form my research paper.

 


Posted October 11, 2018 by marci_jordan1 in category Marci Jordan-31, Uncategorized

Leave a Comment

Your email address will not be published. Required fields are marked *

*