Gretl Empirical Exercise 1
At this point you all should have downloaded and installed Gretl
. If you have not yet done so, do this now using the instructions in the syllabus!Getting Started with Gretl
Once you have successfully installed Gretl
you should have a Gretl
icon on your desktop that looks like this:
Double-click this icon to open the program, you should see this screen:
This is the Gretl
main page and it is what you will see every time you open Gretl.
As you can see, right now it says that no datafile is loaded, so the first thing we need to learn to do is load data into the program.
Loading Data into Gretl
Let’s first load a sample dataset provided by Gretl
. In the toolbar click: file > open data > sample file
The following screen will appear:
Be sure the “Gretl” tab is selected on top and then scroll down until you see the file with name “engel” (see screenshot below), double-click on this file name.
You have now loaded this dataset into memory and will now see this screen:
You can see that there are three variables included in this dataset. Gretl
automatically generates a constant term (this is useful when we begin to run regressions). Also included is data on annual food expenditures and household income, both of which are measured in Belgian Francs. Now, in order to see what types of values our variables take on we would like to display the data in a table…
Displaying a dataset in a table
To do this, in the toolbar click data > select all
. This will highlight all of the variables included in the dataset, excluding the constant. Next, click data > display values
, a new window will appear:
You can scroll down to see that this is cross-sectional data for 235 different households (observations).
We may want a quick way to calculate the mean and standard deviation of these variables, Gretl
will calculate these statistics and many others quite easily. Once again, select all of the variables then click view > summary statistics
. A window will appear showing you which variables you have selected, click OK. The following summary statistics will appear in a new window:Gretl
reports the sample mean, median, minimum, maximum, standard deviation, coefficient of variation, skewness, and excess kurtosis.
Plotting the data in a scatterplot
As you have seen in Chapter 1, one of the first things an econometrician can do to determine if a relationship exists between two variables is to plot the data in a scatterplot. We can do this using Gretl
quite easily. First, select the two variables you wish to analyze. You can do this by holding down CTRL and clicking on the desired variables, or in this case you can simply “select all” again, then right-click in the highlighted area and click “XY scatterplot”. A box will appear asking you to select which variable to put on the x-axis (horizontal); in this case, choose “income” and click OK. The following graph will appear:
As you can see Gretl
will plot each data point and fit a blue OLS regression “best-fit” line through those points. Here we see that income and food expenditures have a positive relationship. As the course progresses we will learn exactly how we estimate this line using ordinary least squares regression.
If you right-click in the scatterplot window you will that you have options to save the graph as a pdf or png (picture) file. This will useful to you when submitting this and future homework assignments.
Now that you have learned some Gretl
basics, let’s upload some new data and generate summary statistics and a few scatterplots.
This time let’s see how in import a dataset that is in Excel (.xls) format. NOTE: In order for Gretl
to import Excel data is must be in the older Excel XXXXXXXXXXxls) format, NOT the newer (.xlsx) format. The excel file included with this empirical exercise (gretl_ex1.xls) includes data on four different variables; child mortality (CM), female literacy rate (FLR), per capita gross national product (PGNP), and total fertility rate (TFR) for 64 different countries.
First, open the file and view the data in Excel. Save it to your computer (make sure you save it in its present (.xls) format (Excel XXXXXXXXXXNotice that the first variable name is in cell A1, the second variable name is in cell B1, and so on. This MUST be the case! Also, the data values begin directly below the variable names in row 2; this MUST also be the case.
Now, close the excel file and open Gretl.
Click file > open data > import > Excel
. As shown below:
You will then see this pop-up window:
Click OK. This is why we had to make sure that the data began at cell A1 in our Excel spreadsheet. Once you click OK, you will see this prompt:
This is just telling us that Gretl
interpreted this data as cross-sectional and is verifying that this is in actually the case. In this case, it is; we have data on 64 different countries, so click NO.
At this point the data is now loaded into memory. Complete the following tasks:
1. Display the data in a table. You need only show me the first 15 or so rows.
2. Generate summary statistics for all four variables.
3. Generate an XY scatterplot of child mortality (CM) and female literacy rates (FLR) with FLR being measured on the x-axis. Is there a relationship between these variables? Is this the relationship you would expect? Why or why not?
4. Generate an XY scatterplot of child mortality (CM) and per capita gross national product (PGNP) with PGNP being measured on the x-axis. Is there a relationship between these variables? Is this the relationship you would expect? Why or why not?
5. Generate an XY scatterplot of total fertility rates (TFR) and per capita gross national product (PGNP) with PGNP being measured on the x-axis. Is there a relationship between these variables? Is this the relationship you would expect? Why or why not?