Saturday, 18 June 2016

PROC SPELL: The Spell of SAS PART -2

Continuing to PROC SPELL: The Spell of SAS PART -1, this is part 2 for making proc spell more comfortable.
Lets take some example to understand how proc spell works.

First create a misspelled file :

Industries understand special traininng needs
Industry understand special training needs
Industrys understund special traininng needs
industri understand spesial training needs
indastry usderstand special training needs
industre undarstund special trainng needs

You should not trast anyone blindly

Now import this SAS file in a SAS system:

%let location = M:\sasfolder\SAS files;
libname data "&location.";

filename sample "&location.\misspelled file.txt";

Now to create a catalogue of words in the file
Proc SPELL words = sample
Create dict = data.mycatgalog.Spell;

Run;

 Now initiate a file for accommodating required output :

Proc Printto print = "&location.\output.txt" new;
Run;

/*  Now with the help of Proc Spell, we try to identify the misspelled words and seek suggestion to correct those, and take output in the above initialized file  */

Proc Spell in = sample
dictionary = data.mycatgalog.Spell
verify suggest;
run;

Proc Printto print = print; Run;


/*  Open the output file to understand the output of Proc Spell, let's get the output back into SAS */

Data data.List_correction;
infile "&location.\output.txt" missover firstobs = 7 ;
input A & $1000. ;

Run;

/*  Transform the output file into readily usable form */
Data data.List_correction;
set data.List_correction;
retain id 1;
if A = "" then id +1;
Run;
Proc transpose data =  data.List_correction out = data.transposed;
by id;
var A;
where A ~="";

Run;

data data.transposed;
length suggested $1000.;
retain id original_word  suggested;
set data.transposed (drop = _name_);
rename Col1 = original_word;
suggested = scan(Col2,2,":"); drop col2; run;


Data data.transposed;
retain id 
original_word  suggested;
set data.transposed;

run;

 ... and here we are with a list of incorrect words with suggested correction. For few words, we might not get any and for others, we might get more than one, Now to it is time to build the further algorithm to replace wrong word with the most appropriate corrected word. You can build a macro and use tranwrd function to replace the word.


This page has taken reference from the link: SAS Gems



PROC SPELL: The Spell of SAS PART -1

Where spelt is a fileref for the file containing the text to be spell checked and words is a libref pointing to the SAS library where the spell checker list is located. The suggest option lists spelling suggestions from the dictionary.
Verify is the default and is assumed unless a custom catalog is being created or updated with Proc SPELL.
To be more useful, a custom dictionary of industry relevant words that are not in SAS' dictionary can be created. This dictionary is a SAS catalog entry and is created with the spell procedure. To create a dictionary:
Create a text file with the upper case words that you want to define.
Put each word on a separate line.
Point to the SAS file that holds the dictionary catalog (if updating) or create a new catalog.
Point to the location of the custom word list.

filename Spelt '<file-specification>';

PROC Spell in         = Spelt 
           dictionary = Words.spell.mywords 
                        verify 
                        suggest;
run;
//***********************************************
filename textdata 'c:\text.txt';
PROC Spell in = textdata
           verify;
run;
//***********************************************
The code below creates the custom word catalog. This newly created catalog includes those entries in the SAS catalog in addition to the added custom words. To add words to a custom catalog change create to update.

PROC Spell words  = "e:sasfolder\condition.txt"
           create 
             dict = work.mycatalog.spell;
run;
//*********************************************
/***  Using Proc Spell, find all combination***/
/***    of letters that compose a valid word composed  ***/
/***    of 3 or 4 letters from a phone number.         ***/                                         

%let myphone = 8 6 0 6 7 3 9 2 7 8; /*** 10 digit phone number ***/
 /*** with spaces between each number as delimiters ***/

 Data A (Keep = L) ;
   Array C (0:9, 4) $1. _Temporary_
                        (' '  ' '  ' '  ' '
                         ' '  ' '  ' '  ' '
                         'A'  'B'  'C'  ' '
                         'D'  'E'  'F'  ' '
                         'G'  'H'  'I'  ' '
                         'J'  'K'  'L'  ' '
                         'M'  'N'  'O'  ' '
                         'P'  'Q'  'R'  'S'
                         'T'  'U'  'V'  ' '
                         'W'  'X'  'Y'  'Z');

  /***                       0 1 2 3 4 5 6 7 8 9 ***/
  Array T (0:9) _Temporary_ (1 1 3 3 3 3 3 4 3 4) ;
  /***                       my    phone number  ***/
  Array P (0:9)  P0 - P9    (&myphone);
  Length L  $12. ;

  Do X0=1 To T(P0) ;  SubStr(L, 1,1) = C(P0,X0) ;
  Do X1=1 To T(P1) ;  SubStr(L, 2,1) = C(P1,X1) ;
  Do X2=1 To T(P2) ;  SubStr(L, 3,1) = C(P2,X2) ;
/*** create a space for position 4 ***/
  Do X3=1 To T(P3) ;  SubStr(L, 5,1) = C(P3,X3) ;
  Do X4=1 To T(P4) ;  SubStr(L, 6,1) = C(P4,X4) ;
  Do X5=1 To T(P5) ;  SubStr(L, 7,1) = C(P5,X5) ;
/*** create a space for position 8 ***/
  Do X6=1 To T(P6) ;  SubStr(L, 9,1) = C(P6,X6) ;
  Do X7=1 To T(P7) ;  SubStr(L,10,1) = C(P7,X7) ;
  Do X8=1 To T(P8) ;  SubStr(L,11,1) = C(P8,X8) ;
  Do X9=1 To T(P9) ;  SubStr(L,12,1) = C(P9,X9) ;
     Output ;
  End; End; End; End; End; End; End; End; End; End;
Run ;

 /*** create an external bad word file for PROC SPELL ***/
 data _null_;
 file "testprocspell.txt";
 set a;
 put L;
 run;

 proc printto file="badwords.lis" new; run;
 proc spell wordlist="testprocspell.txt"; run;
 proc printto; run;

  /*** read bad words created by proc spell ***/
  data badwords;
   infile 'badwords.lis'      missover ;
   input @1 badwords $ 30. ;
   if badwords = ' ' then delete;
   badwords = left(badwords);
   if 0 ne index(badwords, 'File:') then delete;
   if 0 ne index(badwords, 'Unrecognized') then delete;
  run;

  proc sort data=badwords out=badwords nodupkey; by badwords; run;

  /*** now create a badword cleanup file ***/
  data clnbadwd (drop=badwords);
   length target $ 50 replace $ 50;
   set badwords;
   target = badwords; replace = ' ';
  run;

  /*** Using TIP 00128a (CLEANSEM.sas) ***/
  %include 'cleanse_my.sas'; /*** my temporary version ***/

  /*** need to check each schema for cyclical mappings  ***/
  /*** as well as create a global macro variable to get ***/
  /*** number of records for each schema used           ***/
  /*** CHECK OUTPUT for any cyclical mapping reports    ***/

  %chkschem( schema=clnbadwd);

 data clean;
  set a;
        %cleanse(schema=clnbadwd ,var=L       );
  if L ne ' '; /*** keep only non blank L ***/
 run;

 proc sort data=clean out=clean nodupkey; by L; run;

 /*** now create an output file of just valid words ***/
 data _null_;
  set work.clean;
  file "cleanwords.txt";
  put L;
 run;

/*** output from Clean Words Recognized by PROC SPELL 
     default Dictionary                                 ***/
ORE
ORE PART
ORE PAST
ORE WART
PART
PAST
TO
TO    PART
TO    PAST
TO    WART
TO  ORE
TO  ORE PART
TO  ORE PAST
TO  ORE WART
WART

/*** end of program ***/

Reference for PROC Spell: http://analytics.ncsu.edu/sesug/2007/SD06.pdf

For Practice part , next part to be continued in next blog-

Sunday, 5 June 2016

BASE SAS : Interview Questions (Written Exams)

QUESTION 1
The following SAS program is submitted:
data one;
addressl = ‘214 London Way’;
run;
data one;
set one;
address = tranwrd(address1, ‘Way’, ‘Drive’);
run;
What are the length and value of the variable ADDRESS?
Options:
A.    Length is 14; value is ‘214 London Dri’.
B.    Length is 14; value is ‘214 London Way’.
C.    Length is 16; value is ‘214 London Drive’.
D.   Length is 200; value is ‘214 London Drive’.
QUESTION 2
The following program is submitted.
data WORK.TEST;
  input Name $ Age;
datalines;
John +35
;
run;
Which values are stored in the output data set?
Options:
A.    Name              Age
---------------------
John               35
B.    Name              Age
---------------------
John              (missing value)
C.    Name              Age
---------------------
(missing value)   (missing value)
  1. The DATA step fails execution due to data errors.
QUESTION 3
Given the SAS data set WORK.ONE:
Id  Char1
---  -----
182  M
190  N
250  O
720  P
and the SAS data set WORK.TWO:
 Id  Char2
---  -----
182  Q
623  R
720  S
The following program is submitted:
data WORK.BOTH;
   merge WORK.ONE WORK.TWO;
   by Id;
run;
What is the first observation in the SAS data set WORK.BOTH?
Options:
  1. A. Id  Char1  Char2
---  -----  -----
182  M

  1. B. Id  Char1  Char2
---  -----  -----
182         Q

  1. C. Id  Char1  Char2
---  -----  -----
182  M      Q

  1. D. Id  Char1  Char2
---  -----  -----
720  P      S

QUESTION 4
Which program displays a listing of all data sets in the SASUSER library?
Options:
A.    proc contents lib = sasuser.all; run;
B.    proc contents data = sasuser.all; run;
C.    proc contents lib = sasuser._alI_; run;
D.   proc contents data = sasuser._all_; run;

QUESTION 5
The following SAS program is submitted:
proc format;
  value score  1  - 50  = 'Fail'
              51 - 100  = 'Pass';
run;
Which one of the following PRINT procedure steps correctly applies the format?
Options:
A.    proc print data = SASUSER.CLASS;
   var test;
   format test score;
run;
 B.    proc print data = SASUSER.CLASS;
   var test;
   format test score.;
run;
C.    proc print data = SASUSER.CLASS format = score;
   var test;
run;
D.   proc print data = SASUSER.CLASS format = score.;
   var test; 
run;

QUESTION 6
 Given the text file COLORS.TXT:
----+----1----+----2----+----
RED    ORANGE  YELLOW  GREEN
BLUE   INDIGO  PURPLE  VIOLET
CYAN   WHITE   FUCSIA  BLACK
GRAY   BROWN   PINK    MAGENTA
The following SAS program is submitted:
data WORK.COLORS;
  infile 'COLORS.TXT';
  input @1 Var1 $ @8 Var2 $ @;
  input @1 Var3 $ @8 Var4 $ @;
run;
What will the data set WORK.COLORS contain?
Options:
A.    Var1     Var2     Var3    Var4
------   ------   ------  ------
RED      ORANGE   RED     ORANGE
BLUE     INDIGO   BLUE    INDIGO
CYAN     WHITE    CYAN    WHITE
GRAY     BROWN    GRAY    BROWN

B.    Var1     Var2     Var3    Var4
------   ------   ------  ------
RED      ORANGE   BLUE    INDIGO
CYAN     WHITE    GRAY    BROWN

C.    Var1     Var2     Var3    Var4
------   ------   ------  ------
RED      ORANGE   YELLOW  GREEN
BLUE     INDIGO   PURPLE  VIOLET

D.   Var1     Var2     Var3    Var4
------   ------   ------  ------
RED      ORANGE   YELLOW  GREEN
BLUE     INDIGO   PURPLE  VIOLET
CYAN     WHITE    FUCSIA  BLACK
GRAY     BROWN    PINK    MAGENTA

QUESTION 7
The following SAS program is submitted:
data work.accounting;
set work.dept1 work.dept2;
jobcode = ‘FA1’;
length jobcode $ 8;
run;
A character variable named JOBCODE is contained in both the WORK.DEPT1 and WORK.DEPT2 SAS data sets. The variable JOBCODE has a length of 5 in the WORK.DEPT1 data set and a length of 7 in the WORK.DEPT2 data set. What is the length of the variable JOBCODE in the output data set?
Options:
A.    3
B.    5
C.    7
D.   8

QUESTION 8
Given the SAS data set WORK.INPUT:
Var1     Var2
------   -------
A        one
A        two
B        three
C        four
A        five
The following SAS program is submitted:
data WORK.ONE WORK.TWO;
  set WORK.INPUT;
  if Var1='A' then output WORK.ONE;
  output;
run;
How many observations will be in data set WORK.ONE? 
Options: 
A.    8
B.    7
C.    9 
D.   5

QUESTION 9
Given the following SAS error log
data WORK.OUTPUT;
   set SASHELP.CLASS;
   BMI=(Weight*703)/Height**2;
   where bmi ge 20;
   ERROR: Variable bmi is not on file SASHELP.CLASS.
run;
What change to the program will correct the error?
Options:
  1. A. Replace the WHERE statement with an IF statement
  2. B. Change the ** in the BMI formula to a single *
  3. C. Change bmi to BMI in the WHERE statement
  4. D. Add a (Keep=BMI) option to the SET statement

QUESTION 10
The following SAS program is submitted:
footnote1 ‘Sales Report for Last Month’;
footnote2 ‘Selected Products Only’;
footnote3 ‘All Regions’;
footnote4 ‘All Figures in Thousands of Dollars’;
proc print data = sasuser.shoes;
footnote2 ‘All Products’;
run;
Which footnote(s) is/are displayed in the report?
Options:
A.    All Products
B.    Sales Report for Last Month All Products
C.    All Products All Regions All Figures in Thousands of Dollars
D.   Sales Report for Last Month All Products All Regions All Figures in Thousands of Dollars

QUESTION 11
The following SAS program is submitted:
data WORK.LOOP;
  X = 0;
  do Index = 1 to 5  by  2;
    X = Index;
  end;
run;
Upon completion of execution, what are the values of the variables X and Index in the SAS data set named WORK.LOOP?
Options:
  1. A. X = 3, Index = 5
  2. B. X = 5, Index = 5
  3. C. X = 5, Index = 6
  4. D. X = 5, Index = 7

QUESTION 12
This item will ask you to provide a line of missing code; 
The SAS data set WORK.INPUT contains 10 observations, and includes the numeric variable Cost. 
The following SAS program is submitted to accumulate the total value of Cost for the 10 observations:
data WORK.TOTAL;
  set WORK.INPUT;
  <insert code here>
  Total=Total+Cost;
run;
Which statement correctly completes the program?
Options:
  1. A. keep Total;
  2. B. retain Total 0;
  3. C. Total = 0;
  4. D. If _N_= 1 then Total = 0;

QUESTION 13
Given the raw data record DEPT:
----|----10---|----20---|----30
Printing 750
The following SAS program is submitted:
data bonus;
infile ‘dept’;
input dept$ 1-11 number 13- 15;
<insert statement here>
run;
Which SAS statement completes the program and results in a value of ‘Printing750’ for the DEPARTMENT variable?
Options:
A.    department = dept || number;
B.    department = left(dept) || number;
C.    department = trim(dept) number;
D.   department=trim(dept)||put(number,3.);

QUESTION 14
This question will ask you to provide a line of missing code. 
Given the following data set WORK.SALES:
SalesID  SalesJan  FebSales  MarchAmt
-------  --------  --------  --------
W6790          50       400       350
W7693          25       100       125
W1387           .       300       250
The following SAS program is submitted:
data WORK.QTR1;
   set WORK.SALES;
   array month{3} SalesJan FebSales MarchAmt;
   <insert code here>
run;
Which statement should be inserted to produce the following output?
SalesID  SalesJan  FebSales  MarchAmt  Qtr1
-------  --------  --------  --------  ----
W6790          50       400       350   800
W7693          25       100       125   250
W1387           .       300       250   550

Options:
A.    Qtr1 = sum(of month{_ALL_});
B.    Qtr1 = month{1} + month{2} + month{3};
C.    Qtr1 = sum(of month{*});
D.   Qtr1 = sum(of month{3});

QUESTION 15
The SAS data set SASUSER.HOUSES contains a variable PRICE which has been assigned a permanent label of “Asking Price”. Which SAS program temporarily replaces the label “Asking Price” with the label “Sale Price” in the output?
Options:
A.    proc print data = sasuser.houses; label price = “Sale Price”; run;
B.    proc print data = sasuser.houses label; label price “Sale Price”; run;
C.    proc print data = sasuser.houses label; label price = “Sale Price”; run;
D.   proc print data = sasuser.houses; price = “Sale Price”; run;

QUESTION 16
The following SAS program is submitted:
data WORK.TEMP;
  Char1='0123456789';
  Char2=substr(Char1,3,4);
run;
What is the value of Char2?
Options:
A.    23
B.    34
C.    345
D.   2345 

QUESTION 17
The SAS data sets WORK.EMPLOYEE and WORK.SALARY are shown below:
WORK.EMPLOYEE WORK.SALARY
fname age name salary
Bruce 30 Bruce 25000
Dan 40 Bruce 35000
Dan 25000
The following SAS program is submitted:
data work.empdata;
by fname;
totsal + salary;
run;
Which one of the following statements completes the merge of the two data sets by the FNAME variable?
Options:
A.    merge work.employee
work.salary (fname = name);
B.    merge work.employee
work.salary (name = fname);
C.    merge work.employee
work.salary (rename = (fname = name));
D.   merge work.employee
work.salary (rename = (name = fname));

QUESTION 18
The following SAS program is submitted:
data work.sets;
do until (prod gt 6);
prod + 1;
end;
run;
What is the value of the variable PROD in the output data set?
Options:
A.    6
B.    7
C.    8
D.   (missing numeric)