PROC Step: Using SAS Procedures

Early computer programming treated programs as mere lists of instructions. The focus was on the order of the instructions to achieve the execution of tasks under programmers' intention. Organization and reusability of the codes received less attention, leading to programs that prioritized functionality over maintainability. These programs, often referred to as "spaghetti code," became tangled and difficult to understand as they grew in complexity.


A turning point arrived in 1968 with the work of Dutch computer scientist Edsger W. Dijkstra. Dijkstra, a pioneer of structured programming, argued against the overuse of "GO TO" statements in his influential paper, "Go To Statement Considered Harmful." Dijkstra pointed out that code with frequent jumps obscured the overall program flow and logic, making it hard to identify and fix errors. To address this, Dijkstra proposed an approach where programs are decomposed into smaller, self-contained procedures with well-defined interfaces. These procedures are callable in a program, promoting modularity and reusability. 

SAS procedures (PROCs) are pre-defined functionalities that are integrated into the SAS environment. They are fully verified by SAS, and to perform any task in SAS, such as data manipulation, analysis, or reporting, you must use the procedures. By providing well-tested codes as PROCs, SAS can ensure its statistical reliability, as well as the maintainability and reusability of common operations across the entire SAS program. Here's a breakdown of what PROCs can do:

  • Data Analysis and Manipulation: PROCs can sort, summarize, and analyze data. This includes calculating statistics, creating tables and reports, and performing various statistical tests. (e.g., PROC MEANS for descriptive statistics, PROC FREQ for frequency tables)
  • Reporting and Visualization: PROCs can generate various charts, graphs, and formatted reports to visualize your data analysis. (e.g., procedures in SAS/GRAPH for creating visualizations)
  • SQL Queries: Some PROCs allow you to write SQL queries within SAS to interact with relational databases. (e.g., PROC SQL for database queries)

In terms of data manipulation, unlike the DATA step, which creates a new SAS dataset as a result of its execution, PROC steps basically do not generate a new dataset. Instead, they apply pre-built operations, such as sorting, to an existing SAS dataset to facilitate further data analysis.

PROC Statement

Each PROC is equipped with its own set of options and statements to achieve desired output. However, they adhere to basic forms. They begin with the keyword PROC, followed by the procedure's name, like CONTENT, MEANS, or SORT. Any associated options come after its name. For example, the DATA option specifies which SAS dataset to be used as input for the procedure. If it is omitted, SAS will use the most recently created dataset, which is not necessarily the dataset you intend to use. 

To apply a procedure on a permanent SAS dataset, you may include the dataset's two-level name in the DATA option. For example:


TITLE and FOOTNOTE Statements

The TITLE and FOOTNOTE statements are used to add titles and footnotes, respectively, to your PROC result. Both TITLE and FOOTNOTE statements are global statements, meaning that they are technically not a part of any PROC or DATA step. However, considering that the statements apply to the procedure output, it generally makes sense to put them with the procedure.

The TITLE statement consists of the keyword TITLE followed by your desired title enclosed in quotation marks. Similarly, the FOOTNOTE statement follows the same syntax, with the keyword FOOTNOTE preceding your footnote text enclosed in quotation marks. Note that you can also use double quotation marks instead of single ones; there is no functional difference, and it is purely a matter of preference.

If you find that your title or footnote texts contain an apostrophe, you have two options: you can either enclose the text in double quotation marks, or you can put an escape character ' in front of the apostrophe. For example:


Titles and footnotes stay in effect until you replace them with new ones or cancel them with a null statement. For example:


When you specify a new title or footnote, it replaces the old texts with the same number and cancels those with a higher number. One procedure can have up to 10 titles and footnotes. For example:


LABEL Statements

By default, SAS uses variable names to label your output. However, if you require more descriptive names for your variables, you can create them using the LABEL statements. Each label can be up to 256 characters long. For example:


Note that when a LABEL statement is used in a DATA step, the labels become part of the dataset. On the other hand, when used in a PROC step, the labels stay in effect only for the duration of that particular step.

BY Statement

The BY statement specifies the variable(s) by which variable you want to apply a procedure. It is thereby required for the PROC SORT, which sorts observations. For all other PROCs, the BY statement is optional. 

The variables listed in the BY statement are referred to as BY variables. When used in a PROC, other than PROC SORT, the BY statement instructs SAS to perform separate analyses for each unique combination of the BY variable values. However, it is important to note that for this functionality to work, a SAS dataset must be pre-sorted by the BY variables, typically achieved through PROC SORT. Otherwise, SAS will throw an error. For example:


In the SAS LOG window, we can see that it throws an error as we applied a BY variable in the PROC MEANS, without pre-sorting the observations with the variable. If the observations were sorted by the BY variable, SAS will apply the MEANS procedure for each unique value of the variable. For example:



Subsetting in Procedures with the WHERE Statement

One optional statement for any PROC that reads a SAS dataset is the WHERE statement. It allows you to specify a subset of the data to be used in the analysis. While you can also achieve this through a DATA step with IF statements, the WHERE statement serves as a convenient shortcut. Unlike subsetting IFs, which create a new SAS dataset after filtering, the WHERE statement in a PROC directly filters observations and applies the procedure on the current dataset. Thus, it is typically more efficient to use the WHERE statement than to first use subsetting IFs and then apply the procedure.

 

Here are the most frequently used operators for conditional expressions:

Symbolic Mnemonic Example
= EQ WHERE Make = 'Acura';
^=, ~=, <>
NE WHERE Make ^= 'Acura';
> GT WHERE MSRP > 40000;
<
LT WHERE MSRP < 40000;
>= GEWHERE MSRP >= 40000;
<= LE WHERE MSRP <= 40000;
& AND WHERE Make = 'Acura' AND MSRP <= 40000;
|, !
OR WHERE Make = 'Acura' OR Make = 'Audi';

IS NOT MISSING
WHERE MSRP IS NOT MISSING;

BETWEEN AND
WHERE MSRP BETWEEN 30000 AND 40000;

CONTAINS
WHERE Make CONTAINS 'ura';

IN (LIST)
WHERE Make IN ('Acura', 'Audi', 'BMW');

Post a Comment

0 Comments