
Learn Stata Programming: Get Started Mining Statistical Data

Disclosure: Your support helps keep the site running! We earn a referral fee for some of the services we recommend on this page. Learn more
Stata is an application designed to support statistical analysis. It was developed by StataCorp, and released in 1985. Its name is derived from “statistics” and “data,” and it’s used primarily in data analysis and specialist research.
Despite being more than 30 years old, Stata is still in common usage. It allows every analysis to be fully documented, and it can produce graphics, simulations, and charts.
There are four different versions of the application, ranging from a student version through to a version for very large databases. Stata can be installed on Mac, Windows, and Unix computers. The most common version is Stata/IC (IC stands for “Inter Cooled”).
Getting Started With Stata
Stata has its own built in data editor, which looks similar to a spreadsheet editing window. At the bottom of the application, there is a command prompt window. This window logs all of the commands that are entered during a session. Results are shown in the central window.
When a dataset is loaded, Stata shows the variables and labels within it in the Variables and Properties windows.
If you want to play around with Stata without creating your own data, Stata comes with a range of example datasets, and an additional library of manual datasets that can be downloaded from the internet. Load the datasets with the sysuse dir
command, then click the use link next to the file name, or click the describe name to find out more about it.
Bringing Commands and Data Into Stata
Stata can be programmed using the command line, using the command prompt we mentioned above. Once you’ve used a command, you can re-use it by pressing PgUp until the command reappears in the window.
The application can also be controlled through a graphical user interface, or by importing a Do file (also called a syntax file), which is a series of pre-defined commands that are run as a script.
Seasoned Stata users usually recommend that the graphical interface is best avoided, but it provides an easy way to learn Stata’s programming language. Every time you point and click on a command, the corresponding code is displayed in the Command window, so you can see what Stata is doing in the background.
The datasets you use can be imported into Stata from a CSV file, or a Stata file. In recent versions of Stata, you can directly import data from Excel using the import excel
command.
Basic Stata Commands
Stata can perform different types of calculations and analysis, so it helps to have a basic working knowledge of its commands. Every command is case sensitive, although certain abbreviations can be used.
In the section above, we mentioned the import excel
command. This is a simple example of a Stata command in action:
import excel using filename.xls, ///
sheet(Sheet'1') cellrange (A1:D20) clear
This command specifies the sheet and specific cells to import using the sheet
and cellrange
commands. If a single cell is specified as the cellrange, all of the data beyond that cell will be imported.
You will come across many other commands as you start working with Stata. Some of the basics are good to know:
- display shows the result of a calculation
- summarize displays a summary of the data in a file (follow it with the variables you want to examine)
- help shows the help for a command or function (use it alone, or follow it with the name of the command you need help with)
- if missing() is one of the many ways you can filter the data Stata returns when you query a dataset
- graph draws a graph of the data in the dataset; it must be followed by the type of graph, the X-axis variable and the y-axis variable
- describe displays information about a file
- nonew stops Stata from opening a new Results window each time you run a command
- snapshot creates an undo point for your project (remember: Stata has no built-in undo command)
- clean returns the results of a query without a table border
- clear empties all data from RAM once a command is run. This is important, because Stata loads all of its data into RAM unless otherwise instructed. When working with large datasets, this can cause the computer to slow down or crash
- findit searches for Stata extensions, or plug-ins, that can enhance its functionality
- /// tells Stata that the command continues on the next line; you can comment after the slashes if you wish, and the comments will be ignored providing they are on the same line
- ; tells Stata the command is finished
- exit closes the application; this is the equivalent of clicking File -> Exit with your mouse
Once you get used to working in Stata, you can save commands to a Do file by using the Editor window. You can also save a text file with the .do extension, and then run your Do file in Stata using the do
command, followed by the filename. Stata uses the same commenting methods as C++ and other languages; a double slash //
comments out everything at the end of a line, while /*
and */
can be used at the beginning and end, to comment out an entire section.
Keeping Track of Your Work
One of Stata’s biggest strengths is its ability to log queries, which makes it invaluable for researchers who need to prove how they reached certain conclusions. In order for logging to be active, there are a few steps to follow.
- Create a directory for your project. By default, Stata will work in C:DATA, so creating a separate directory keeps things neat.
- Turn logging on. Use the
log using
command, following it with the filename you want to use. - Always save commands in a Do file. While this isn’t strictly necessary, it’s helpful when you want to reproduce a result or backtrack over your commands.
Further reading
- StataCorp YouTube Channel: easy-to-digest help videos on various features in Stata.
- Statalist: a discussion forum where Stata users can swap information and help.
- UCLA Stata Resources: a directory of free courses, modules, links and FAQs on basic and intermediate use of Stata.
- Stata Tutorial from Princeton University: Germaín Rodríguez’ excellent Stata guide.
- Do Files and Project Management: how to create Do files and keep track of your commands.
- Stata 14 Macros: a reference guide to some of the macros available in Stata version 14.
- StataCorp NetCourses: paid courses from the makers of Stata, designed to ease new users in at an affordable price.
Summary
Stata is an older programming language and development environment designed for solving statistical problems. But it is still widely used by an active community. If you do serious statistical work, Stata is a good language to know. With this introductions and our recommended resources, you should be on your way.
Comments