When big data in eLearning is discussed, I often reflect on the difference in the terms data and information. Data is, of course, raw recorded information. Data is not, in and of itself, actionable. Data must go through a process of analysis, organization, and presentation in order for it to be actionable. This now actionable data is information. So how does this metamorphosis from data to information take place?
Much of the analysis is done with a coding language called Python.
Python is easy to learn, powerful, and free. Before one can do any serious data analysis, however, you have to learn the basics.
In this month’s article I am going to demonstrate some basics of Python, and then we’ll use the language to do the simplest of data analysis.
Free Python development environment
There are dozens of Python environments. In fact, if you own a Mac, your computer can interpret Python out of the box. Figure 1 is only intended to show you what this looks like, but you can use it to get started, if you wish.
Figure 1: The Macintosh OS can process Python “out of the box” using the command line tools. Here, Python commands are being issued in Python’s “interactive” mode.
For this tutorial, I’d recommend you use the web-based Python tool available at https://repl.it/languages/python. (Figure 2) This tool will process Python commands right in your web browser without any installation. Because it is a web-based tool, it will work on any machine with an internet connection.
Figure 2: This online Python environment is actually quite powerful and found at https://repl.it/languages/python
Once the website loads up you’ll enter your code on the left and the code will execute on the right.
Now we’re ready to write our first lines of code in Python. Keep in mind that our goal here is not an exhaustive study of the Python language. Instead, I'm going to give you a quick introduction and allow you to fill in details later.
print() displays your results to the screen
Every language needs a way to output to the screen. In Python, similar to other languages you may have heard of, that command is print(). The print() command can output two types of data. Let’s start with strings. Enter the following code into your editor (each line must end with a line break):
print("The eLearning Guild") print("DevLearn") print("Learning Solutions")
Once you’ve carefully typed the code, being cognizant of all of the symbols, click the prominent run button positioned above the code you just typed.
After a few moments, you should see the result of your efforts in the black terminal area to the right of your code. (Figure 3)
Figure 3: This is the result of the code you just entered
You probably suspected the output would be similar. The string value is inside the parentheses. It is not interpreted before being output by the Python interpreter. The string (minus the quotes) is simply displayed.
Python can also display expressions. Expressions are essentially numerical values that have been evaluated. Enter the following code starting on line four of the editor:
print(300) print(45.5662) print(29/3*2+21.6)
Click the run button again and you should notice that the single values are output directly, and the arithmetic statement is fully evaluated before a result is displayed.
Try to print a few strings and expressions of your own so you get the hang of it. When you feel ready to move on, delete the code you’ve created so far so we can start anew.
Variables store values in memory
A variable is a placeholder in the computer’s memory. When we declare a variable, we essentially reserve a place in memory and store a value there. Variables can store strings or numbers. Try declaring a few:
name = "Mark Lassoff" age = 45 band = "Journey" gpa = 3.77
Looking at the first example, we’ve stored the first string (my name) in a variable called name. When we refer to name later, the location is determined and the string “Mark Lassoff” is retrieved. The variable called gpa refers to a location that stores the value 3.77.
We can output the value of a variable with a print statement. Add the following code:
print (name) print (age) print (band) print (gpa)
Run your code with the run button and you should see that all of the variable values are printed to the terminal within the website. (Figure 4)
Figure 4: You have now printed all the variable values
Practice by replacing the variable values with your own name, age, and favorite band.
Conditionals allow programs to make decisions
Conditionals are, essentially, how computer programs make decisions. The most common conditional is the if statement. The generic expression of the if statement is:
If (this is true): Do this And this
Inside the parentheses is the statement being evaluated. For example, if we wanted to test whether or not a user could legally consume alcohol in the US, we could test to make sure their age was greater than or equal to 21 like this:
if(age >= 21): print (name + " is able to legally drink")
If the condition evaluates as true, the text indented under the if statement executes. If not, it doesn’t. Using the variables we entered before, the program execution would look like Figure 5.
Figure 5: A conditional and the result of its execution
Loops repeat until complete
Loops will continue doing something while some condition is true. For example, we could create Python code that would count to 100 with a loop. The most common type of loop in Python is the while loop.
x = 1 while ( x < 101 ): print(x) x = x + 1
In this code snippet, we are declaring a variable called x. That variable will be our counter. It initially contains the value 1.
Remember that the loop will execute while some condition is true. In this case, that condition is that the value of x is less than 101. The two lines of code indented under the while statement will execute while that condition is true—meaning that once x reaches 101 the loop will exit.
Let’s examine the two lines indented underneath the while statement. The first line simply prints the value of x. (At first 1, then 2, then 3, and so on…) The next line adds 1 to the value of x. This will continue until the while statement is no longer true and the loop exits. (Figure 6)
Figure 6: Examine the loop
Practice on your own by creating a loop that counts from 1 to 50 by twos, and then one that counts from 100 backward to zero.
A simple data analysis example
Let’s use Python to convert some simple data to information. Let’s say we were teaching a class called Python 101. We gave an exam and the grades were as follows:
65, 63, 97, 88, 75, 78, 100, 55, 73, 81, 90, 79, 88, 33, 92, 95, 82, 88, 71, 85, 81, 80, 70, 94
Python can help us analyze that data. First, let’s store the data in a structure called an array. An array is essentially a variable with more than one value.
grades = [ 65, 63, 97, 88, 75, 78, 100, 55, 73, 81, 90, 79, 88, 33, 92, 95, 82, 88, 71, 85, 81, 80, 70, 94]
Let’s follow this with the analysis code and then we’ll review it.
x = 0highGrade = 0lowGrade = 100averageGrade = 0totalGrades = 0while( x < len(grades) ) : if (grades[x] > highGrade) : highGrade = grades[x] if (grades[x] < lowGrade): lowGrade = grades[x] totalGrades = totalGrades + grades[x] x = x + 1print("Highest Grade: " + str(highGrade) )print("Lowest Grade: " + str (lowGrade) ) print("Average Grade: " + str (totalGrades/len(grades) ) )
This program will take our raw data, determine the highest and lowest scores, and then determine the average. We’re making use of all the programming structures I discussed previously. We begin by declaring some initial values for our variables.
x = 0 highGrade = 0 lowGrade = 100 averageGrade = 0 totalGrades = 0
Next is the heart of the program. The while loop is designed to go through all the grades values, one at a time. With each value, the loop determines if it’s the current highest or lowest score using if statements. If it is the current highest or lowest score, the value is stored in the appropriate variable. As the loop continues, the values of grades are totaled in a variable called totalGrades. Finally, in the while loop, the value of x is incremented so that the next value in the array is accessed in the next loop iteration.
while( x < len(grades) ): if (grades[x] > highGrade): highGrade = grades[x] if (grades[x] < lowGrade):? lowGrade = grades[x] totalGrades = totalGrades + grades[x] x = x + 1
The final part of the program simply outputs the results with print() statements. (Figure 7)
Figure 7: The complete program and its output
While we’re obviously not doing any deep statistical analysis yet, you can hopefully begin to see the potential of using Python to massage data and bring forward information.