PROC Step: Printing Your Data

In data analysis, printing out observations is useful in many situations. For example, when exploring a new dataset, printing out a subset of observations can provide some initial grasps into the data's structure, content, and quality. When cleaning your data, by reviewing printed observations in a dataset, you can intuitively identify any outliers or missing values. Occasionally, printing out observations is also required in documentation to enhance reproducibility and reliability of your data reports.

SAS provides some procedures to print out observations in a dataset: PROC PRINT and PROC FORMAT. In this guide, we will delve into the basic usage of these three procedures with some practical examples. Let's get started!


PROC PRINT

PROC PRINT is perhaps the most widely used procedure in SAS programming. It prints out a subset of a dataset for initial data exploration, data cleaning, and documentation. The basic syntax of the procedure is:

PROC PRINT DATA=sas_dataset;
TITLE 'Your Title'; /* Optional statement to add title */
FOOTNOTE 'Your footnotes'; /* Optional statement to add footnotes */
RUN

For any procedures, if not specified otherwise, SAS uses the most recently created dataset. PROC PRINT is no exception. In practice, it is almost always recommended to explicitly specify the DATA= option for clarity in your program, as it is often hard to quickly determine which dataset was created last.

In addition to DATA=, some useful options for PROC PRINT are:

  • NOOBS: By default, SAS prints the observation numbers along with the variables. If you don't want observation numbers, however, you can add the NOOBS option at PROC PRINT.
  • LABEL: This option allows you to use variable labels instead of variable names in the output. This option enhances readability of your output, particularly useful for documentation purposes.
  • (OBS=n): This suboption prints out only the first n observations from the beginning.

The following statement shows all of these options together:


Following the PROC statement with some options, you may add some following optional statements to the procedure for your needs:

  • BY variable-list;
    • In the context of PROC PRINT, the BY statement starts a new section in the output for each new value of the BY variables and prints the values of the BY variables at the top of each section. Note that the data must be presorted by the BY variables.
  • ID variable-list;
    • When you use the ID statement, the observation numbers are not printed. Instead, the variables in the ID variable list appear on the left-hand side of the page.
  • SUM variable-list;
    • The SUM statement prints sums for the variable in the list.
  • VAR variable-list;
    • The VAR statement specifies which variables to print and the order. Without a VAR statement, all variables in the SAS dataset are printed in the order that they occur in the dataset.
  • FORMAT variable format;
    • You can change the appearance of printed values using standard data formats. 
    • For numeric values, you can specify a format along with the width w and decimals d (formatw.d). Note that the period and d also counts for w. For example, 5.3 can display up to 9.999.  
    • For character values, you must put a dollar sign to indicate that it is character format ($formatw.). It takes only the width w.
    • Internally, the only two data types a SAS dataset can have are numeric and character. Any date values are stored as the number of days since Jan 1, 1960. Thus, to display it as actual date values, you must specify the format.

Here are some selected standard data formats that are commonly used in daily practices:

Syntax Description Example Format Result
Character
$UPCASEw. Converts character values to upper case. w ranges 1-32767, defaults to 8.
my cat
$UPCASE6.
MY CAT
$w. Writes standard character data - does not trim leading blanks (same as $CHARw.) w ranges 1-32767, defaults to 1. my cat  
 my snake
$8. '*'
my cat  *
 my snak*

Date, Time, and Datetime
DATEw. Writes SAS date values in form ddmmmyy or ddmmmyyyy. w ranges 1-11, defaults to 7. 8966
DATE7.
DATE9.
19JUL84
19JUL1984

DATETIMEw.d Writes SAS datetime values in form ddmmmyy:hh:mm:ss.ss. w ranges 7-40, defaults to 16. 12182
DATETIME13.
DATETIME18.1
01JAN60:03:23
01JAN60:03:23:02.0
DTDATEw. Writes SAS datetime values in form ddmmmyy or ddmmmyyyy. w ranges 5-9, defaults to 7.
12182
DTDATE7.
DTDATE9.
01JAN60
01JAN1960
EURDFDDw. Writes SAS date values in form dd.mm.yy or dd.mm.yyyy. w ranges 2-10, defaults to 8.
8966
EURDFDD8.
EURDFDD10.
19.07.84
19.07.1984
JULIANw. Writes SAS date values in Julian date form yyddd or yyyyddd. w ranges 5-7, defaults to 5.
8966
JULIAN5.
JULIAN7.
84201
1984201
MMDDYYw. Writes SAS date values in form mm/dd/yy or mm/dd/yyyy. w ranges 2-10, defaults to 8.
8966
MMDDYY8.
MMDDYY6.
7/19/84
071984
TIMEw.d Writes SAS time values in form hh:mm:ss.ss. w ranges 2-20, defaults to 8.
12182
TIME8.
TIME11.2
3:23:02
3:23:02.00
WEEKDATEw. Writes SAS date values in form day-of-week, month-name dd, yy or yyyy. w ranges 3-37, defaults to 29.
8966
WEEKDATE5.
WEEKDATE9.
Thu, Jul 19, 84
Thursday, July 19, 1984
WORDDATEw. Writes SAS date values in form month-name dd, yyyy. w ranges 3-32, defaults to 18.
8966
WORDDATE12.
WORDDATE18.
Jul 19, 1984
July 19, 1984
Numeric
BESTw. SAS decides best format - default format for numeric data. w ranges 1-32
1200001
BEST6.
BEST8.
1.20E6
1200001
COMMAw.d Writes numbers with commas. w ranges 2-32, defaults to 6, defaults to 12.
1200001
COMMA9.
COMMA12.2
1,200,001
1,200,001.00
DOLLARw.d Writes numbers with a leading $ and commas separating every three digits. w ranges 2-32, defaults to 6.
1200001
DOLLAR10.
DOLLAR13.2
$1,200,001
$1,200,001.00
Ew. Writes numbers in scientific notation. w ranges 7-32, defaults to 12.
1200001
E7.
1.2E+06
EUROXw.d Writes numbers with a leading € and periods separating every three digits. w ranges 2-32, defaults to 6.
1200001
EUROX13.2
€1.200.001,00
PERCENTw.d Writes numeric data as percentages. w ranges 4-32, defaults to 6.
0.05
PERCENT9.2
5.00%
w.d
Writes standard numeric data. w ranges 1-32.
23.635
6.3
5.2
23.635
23.64


PROC FORMAT

Sometimes, standard data formats are not enough for specific needs, and you might want to have your custom formats. Particularly, when dealing with coded raw data values, displaying the values using user-defined formats would be very convenient as it removes necessity of data code book for data interpretation. In SAS, you can achieve this through PROC FORMAT.

The FORMAT procedure creates formats that will later be associated with variables in a FORMAT statement. The procedure starts with the statement PROC FORMAT and continues with one or more VALUE statements (other optional statements are available):

PROC FORMAT;
VALUE name range-1 = 'formatted-text-1'
range-2 = 'formatted-text-2'
range-n = 'formatted-text-n';
RUN

Where name is the name of the format you are creating. Note that if the format is for character data, the name must start with a dollar sign ($name). Format names must be unique to each other, can be up to 32 characters long (including the $ for character data), must not start or end with a number, and cannot contain any special characters except underscores.

In the VALUE statement, each range represents the value of a variable that is assigned to the text given in quotation mark on the right side of the equal sign. These formatted texts can be up to 32,767 characters long, but some procedures print only the first 8 or 16 characters. 

For example, let's consider the following dataset from a survey:

We see that the data entries are encoded with some integers. To enhance the interpretability of this dataset, you can create a PROC FORMAT. For example:


In the VALUE statements character values should be enclosed in quotation marks. If there is more than one value in the range, you may separate the values with a comma (,) or use a hyphen (-) for a continuous range. The keywords LOW and HIGH can be used in ranges to indicate the lowest and highest non-missing value for the variable. You may also use the less than symbol (<) in ranges to exclude either end point of the range. The OTHER keyword can be used to assign a format to any values not listed in the VALUE statement.

Post a Comment

0 Comments