Sunday, 2 September 2012

Lesson 3: Appearance of Output, Infile, Labels and Informats




Keeping Up Appearances
Suppose your output doesn't fit on the page the way you like when you print it or paste it into a word processor.  An options statement can be used to adjust the number of lines and columns used in the page formatting.  In the example below, "ps" stands for "pagesize" (you can also spell it out if you like).  This is the number of lines on a page.  Next, "ls" stands for "linesize" (you can spell this out too), which is the width of a line in characters.  If you don't want the date and time displayed, you can include "nodate," and if you'd like to reset the starting page number, "pageno=n" will do that, where you replace "n" with the number you want.  If page number is not reset, the page numbers keep incrementing in the output window, even if you clear it.  There are many more options available (see the help or manuals).

Reading from a File
Now, back to reading data.  Our example data set is small, and easy enough to type in.  If the data set is large, it may not be convenient to type everything into the program.  In this case, the data may be saved in a text file outside of SAS.  Let's say we had the same data saved in a file called F:\sample.txt.  The following program would then have almost the same effect as the previous one (can you tell what will be different?):

(Since the library isn't specified, this data will be saved in work.)
There are some details to take note of here.  It may seem that the infile statement replaces the "cards" section, but that is not what really happens.  The infile statement comes before the input statement, whereas the "cards" section comes after the input statement.  (In fact, "cards" must always be placed last in the data step.)  When you include a "cards" statement, SAS automatically assumes there is a default infile statement that says "infile cards;" before the input statement.  In other words, SAS treats the "cards" section just like an external file.  This is confirmed by the fact that it does not appear in the program statements copied to the log. You can explicitly include the infile cards statement, especially if you need some of the optional commands available with infile.  This will be discussed further in a future lesson.
There is also an import wizard available from the File Menu.  It will load Excel and other popular file types too. (One caveat for Excel imports:  You must look under the "options" button in the wizard, where there is a check box to indicate whether the first line contains data or variable names.)  In Enterprise Guide, this can be accomplished by going to the "Insert" menu and choosing "data."
Labels for Variables
SAS allows variable names to be 32 characters long.  You can have upper and lower case letters, numbers, and underscores, but cannot start with a number.  Programming statements do not distinguish between upper and lower case, but the cases are remembered for use in output.  This flexibility takes care of most variable naming needs.  However, there are times when even more flexibility is desired.  Perhaps you want true spaces (not underscores) between words, or special characters that are not allowed.  Or maybe you'd like to use a short name like "LEye" in your program, but want the output to say "Left Eye Acuity."  For these situations, SAS allows us to assign labels to variables.  Labels can be up to 256 characters long and may include almost any text symbols.  Labels are assigned in the data step and are stored with the data set.  Many SAS procedures, like proc print, can use the labels in producing output.  In the example below, note the syntax of the label statement in the data step:  The statement begins with the keyword label, followed by a variable name, equal sign, and the label in quote marks.  More labels may be assigned in the same statement.  They are listed one after the other separated by spaces (no commas).  To use the labels in proc print, the option label is added to the proc print statement.  The example shows the results both without and with the label option.

Character Informats
So far we've discussed reading fairly straight-forward data consisting of numbers or short words.  We will now explore more complex data types.
Consider this example:

No errors in the log....

Why isn't "Amphitheater" complete?  The default length for character variables is eight.  SAS has only read eight characters from the data even when more characters are present.  We need to tell SAS to make the x variable hold more characters.  Try this:

The "$12." expression in the input statement is called an "informat."  Think of it as an "input format" that tells SAS what to expect the data to look like.  The dollar sign signifies that it is a character variable, and the 12 is the length of the field to be read, and also the length of the resulting variable.  All informats have a period in them.  This is part of the syntax that SAS uses to recognize an informat.
Let's look at four more examples to demonstrate the behavior of informats.  These examples will use two character variables.  In the first example, we show what happens with only dollar signs to indicate that they are character variables.  Of course, the values are now cut off at eight characters, but otherwise the data are read correctly:

Now suppose we attempt to fix the length problem by putting in informats.  Then, SAS reads the full 12 characters for x, regardless of whether or not there are spaces included:

To fix this, we need SAS to treat a space as a delimiter just as it does when a dollar sign is used alone.  The "colon modifier" placed in front of the informat will give this result.  In fact, the dollar sign alone is an abbreviation for ":$8.".

The character informat can also be used to create variables shorter than eight characters.  For large data sets, this can result in considerable space savings.  For example, the data set might contain a variable that is a one-character code (such as M or F for male or female).  Using an informat of $1. would then be appropriate, and would save seven bytes of storage for every observation.
Running out of Line
An additional problem occurs when using these techniques to read from an external file.  A character informat at the end of a line causes SAS to try to read all of the characters for the width of the field specified by the informat.  If the line is shorter than that, errors will be reported in the log and the data will not be read correctly.  One solution is to add the option "truncover" in the infile statement after the file path.  The meaning of "truncover" is something like "keep reading over the whole field even if the line is truncated."  (This problem does not occur with instream data.)  A colon modifier with the informat can often solve this problem, too.
There is a related option, "missover" that can also be used when there are not enough variable values at the end of the line and you want the remaining ones set to missing in the SAS data set.  Without "missover", SAS would go to the next line and try to continue reading the input variables for which it did not find values.  The two options behave almost identically.  The difference is that missover will set short values to missing when an informat (no colon) is used, while truncover will not.  Truncover also does the same thing when more than one value is missing at the end of a line.

Exercises
From now on, include titles in all your output that give your name, the lesson number, and problem number.  Unless otherwise directed, do not change data sets, and turn in your editor, log, and output.
1.  Download the file at this link.  It contains four variables, FirstName, LastName, Age, and Score.  Write a SAS program to read this data from the external file and print it to the output window.  Save the data set in a library that you specify (not work).  Experiment with title and options statements to change the appearance of your output.
2.  Copy the data below into the SAS editor and write a data step to read it, followed by a proc step to print it (to the output window).  Make sure the printout matches the original data.  The variables are first and last name, sex, and age.  Include labels in your data set and output.  Also include a two-line title and suppress the printing of the time and date on each page.
Andy Stewart M 47
Martha Gustafson F 55
Marissa Maneschevitz F 32
John Fitzgerald M 28
Jacqueline Martin F 33
3.  Download the file at this link.  Similar to a previous exercise, it contains four variables, Age,  Score, LastName, and FirstName.  Examine the data file carefully, then write a SAS program to read this data from the external file and print it to the output window.  Include an appropriate title.

No comments:

Post a Comment