Читать книгу SAS Viya - Kevin D. Smith - Страница 12
Executing Actions on CAS Tables
ОглавлениеThe simple action set that comes with CAS contains some basic analytic actions. You can use either the help action or the IPython ? operator to view the available actions.
In [17]: conn.simple?
Type: Simple
String form: <swat.cas.actions.Simple object at 0x4582b10>
File: swat/cas/actions.py
Definition: conn.simple(self, *args, **kwargs)
Docstring :
Analytics
Actions
-------
simple.correlation : Generates a matrix of Pearson product-moment
correlation coefficients
simple.crosstab : Performs one-way or two-way tabulations
simple.distinct : Computes the distinct number of values of the
variables in the variable list
simple.freq : Generates a frequency distribution for one or
more variables
simple.groupby : Builds BY groups in terms of the variable value
combinations given the variables in the variable
list
simple.mdsummary : Calculates multidimensional summaries of numeric
variables
simple.numrows : Shows the number of rows in a Cloud Analytic
Services table
simple.paracoord : Generates a parallel coordinates plot of the
variables in the variable list
simple.regression : Performs a linear regression up to 3rd-order
polynomials
simple.summary : Generates descriptive statistics of numeric
variables such as the sample mean, sample
variance, sample size, sum of squares, and so on
simple.topk : Returns the top-K and bottom-K distinct values of
each variable included in the variable list based
on a user-specified ranking order
Let’s run the summary action on our CAS table.
In [18]: summ = iris.summary()
In [19]: summ
Out[19]:
[Summary]
Descriptive Statistics for IRIS
Column Min Max N NMiss Mean Sum Std \
0 SepalLength 4.3 7.9 150.0 0.0 5.843333 876.5 0.828066
1 SepalWidth 2.0 4.4 150.0 0.0 3.054000 458.1 0.433594
2 PetalLength 1.0 6.9 150.0 0.0 3.758667 563.8 1.764420
3 PetalWidth 0.1 2.5 150.0 0.0 1.198667 179.8 0.763161
StdErr Var USS CSS CV TValue \
0 0.067611 0.685694 5223.85 102.168333 14.171126 86.425375
1 0.035403 0.188004 1427.05 28.012600 14.197587 86.264297
2 0.144064 3.113179 2583.00 463.863733 46.942721 26.090198
3 0.062312 0.582414 302.30 86.779733 63.667470 19.236588
ProbT
0 3.331256e-129
1 4.374977e-129
2 1.994305e-57
3 3.209704e-42
+ Elapsed: 0.0256s, user: 0.019s, sys: 0.009s, mem: 1.74mb
The summary action displays summary statistics in a form that is familiar to SAS users. If you want them in a form similar to what Pandas users are used to, you can use the describe method (just like on DataFrames).
In [20]: iris.describe()
Out[20]:
SepalLength SepalWidth PetalLength PetalWidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
Note that when you call the describe method on a CASTable object, it calls various CAS actions in the background to do the calculations. This includes the summary, percentile, and topk actions. The output of those actions is combined into a DataFrame in the same form that the real Pandas DataFrame describe method returns. This enables you to use CASTable objects and DataFrame objects interchangeably in your workflow for this method and many other methods.