The C3 AI Suite has Types specifically designed to facilitate ce=
rtain machine learning pipelines. Taking their inspiration from the Scikit =
learn machine learning pipeline. At the most general, we have the `MLPipe` =
Type defining the most basic behavior of the pipeline with the ````
train=
```

`, ``process`

`, and ``score`

` methods, and var=
ious specializations which define a whole pipeline of operations. With this=
structure, you can call '`train`

' on the top level pipeline Typ=
e and the whole pipeline is trained. Same with '`process`

' and '=
`score`

' to both process inputs and score results. We'll start b=
y discussing the basic Types which are available, beginning with the abstra=
ct Types forming the basis of this Machine Learning system, then some concr=
ete implementations of those Types and finally some examples. We'll then di=
scuss how a new machine learning model may be ported onto the C3 AI platfor=
m.

**MLPipe**: An abstract Type which defines general behavior=
s like train, process, and score. This Type is mixed in by nearly all Types=
in the C3 AI Machine Learning space.

**MLPipeline**: An abstract Type which mixes MLPipe and con=
nects multiple steps into a full pipeline. The provided concrete implementa=
tion of this Type is `MLSerialPipeline`. The MLPipeline contains a list of =
MLSteps to perform.

**MLStep**: A helper Type to allow MLSerialPipeline and oth=
er implementations of MLPipeline store their step pipelines and metadata.

**MLLeafPipe**: This is an abstract Type mixing MLPipe whic=
h is meant for steps in a machine learning pipeline which have no child ste=
ps, i.e. this step performs a specific action of the pipeline. In the termi=
nology of the C3 AI Suite, this is a 'leaf' of the pipeline. There are seve=
ral concrete versions of this Type and they are usually different implement=
ations within various machine learning systems.

**CustomPythonMLPipe**: A helper Type to act as a 'base' fo=
r defining new python based machine learning Pipes.

**MLSerialPipeline**: This is the concrete implementation o=
f the MLPipeline Type. Since MLSerialPipeline is so general, you won't have=
to subclass it.

**SklearnPipe**: This concrete implementation of MLLeafPipe=
provides a straightforward way to use sklearn machine learning pre-process=
ing and modelling functions. See the TutorialIntroML=
Pipeline.ipynb notebook on your C3 Cluster for more information.

=

**TensorflowClassifier**: This Type allows the user to stor=
e a tensorflow estimator-based model, specifically a classifier. This Type =
currently works with tensorflow 1.9.0. See the ~~TutorialTensorflowPipe.ipynb notebook on your C3 Clust=
er for more information.~~

**TensorflowRegressor**: This Type allows the user to store=
a tensorflow estimator-based model, specifically a regressor. This Type cu=
rrently works with tensorflow 1.9.0. See the TutorialTensorflowPipe.ipynb notebook on your C3 Cluste=
r for more information.

**KerasPipe**: This Type allows the user to store a keras t=
ensorflow model. This Type currently uses tensorflow 2.1.0. See the TutorialKerasPipe.ipynb notebook on your C3 Cluster for mor=
e information.

**XgBoostPipe**: This Type implements the sklearn-compatibl=
e part of the Xgboost library. See the TutorialXgBoostMLPipeline.ipynb notebook on your C3 Cluster fo=
r more information.

Let's take a look at the C3 AI developed TutorailIntroMLPipeline.ipynb n= otebook.

The first step in any machine learning task is to prepare the data. In t= his case, we're going to use the popular iris dataset. We first use some sk= learn functions to load the data, and split it into a training set and test= ing set.

=20

from skle= arn.datasets import load_iris from sklearn.model_selection import train_test_split iris =3D load_iris() datasets_np =3D train_test_split(iris.data, iris.target, test_size=3D0.2, r= andom_state=3D42)=20

Now, datasets_np consists of four numpy arrays: training data, training = targets, testing data, and testing targets. If we inspect the array dimensi= ons we expect the training data and testing data to have shape '(120,4)' an= d the training and testing targets to have shape '(120,)'.

We must now convert this numpy array into something usable by C3 AI Suit=
e, a **Dataset**. C3 AI provides the helper function ```
c3.=
Dataset.fromPython()
```

to convert numpy arrays to datasets:

=20

XTrain, X= Test, yTrain, yTest =3D [c3.Dataset.fromPython(pythonData=3Dds_np) for ds_n= p in datasets_np]=20

C3 AI Machine Learning Pipelines can be thought of as a series of steps.= These steps can be nested, so we can define a 'preprocessing' step (perhap= s containing multiple steps itself) which can normalize and transform data = into a better form for ML algorithms, and a regression step which runs the = ML model on the transformed data. So, we'll build the MLPipeline step by st= ep.

Let's build a preprocessing pipeline which will first scale the data wit= hin the interval [0,1], then we'll do a principal component analysis and ex= tract the first two principal components. These components will be easy for= a machine learning algorithm to use.

First, let's build sklearn StandardScaler, and PCA decomposition MLLeafP= ipes to contain those transformation steps:

=20

# Define = the MLLeafPipe which holds the StandardScaler step. standardScaler =3D c3.SklearnPipe( =09name =3D "standardScaler", =09technique=3Dc3.SklearnTechnique( =09=09# This tells ML pipeline to import sklearn.preprocessing.StandardScal= er. =09=09name=3D"preprocessing.StandardScaler", =09=09# This tells ML pipeline to call "transform" method on sklearn.prepro= cessing.StandardScaler when we invoke the C3 action process() later. =09=09processingFunctionName=3D"transform" =09) ) # Define the MLLeafPipe which holds the PCA decomposition step. pca =3D c3.SklearnPipe( =09name =3D "pca", =09technique=3Dc3.SklearnTechnique( =09=09name=3D"decomposition.PCA", =09=09processingFunctionName=3D"transform", =09=09# hyperParameters are passed to sklearn.decomposition.PCA as kwargs =09=09hyperParameters=3D{"n_components": 2} =09) )=20

Now we can combine these two steps into a preprocessing MLSerialPipeline= :

=20

# Define = the preprocessing pipeline which chains the scaler and pca steps together. preprocessPipeline =3D c3.MLSerialPipeline( =09name=3D"preprocessPipeline", =09steps=3D[c3.MLStep(name=3D"standardScaler", pipe=3DstandardScaler), =09=09 c3.MLStep(name=3D"pca", =09=09=09=09=09 pipe=3Dpca)] )=20

Note that we store the individual steps as ``MLStep`

` Types i=
n the '`steps`

' array.

Now we have ``preprocessPipeline`

` containing our preprocessi=
ng pipeline! We can use this pipeline as part of a larger pipeline now.

We now need to define our logistic regression model. We'll create an Skl= earnPipe (a concrete MLLeafPipe) which contains this example:

=20

# Leaf-le= vel Logistic Regression classifier. logisticRegression =3D c3.SklearnPipe( =09name=3D"logisticRegression", =09technique=3Dc3.SklearnTechnique( =09=09name=3D"linear_model.LogisticRegression", =09=09processingFunctionName=3D"predict", =09=09hyperParameters=3D{"random_state": 42} =09) )=20

This SklearnPipe now contains the sklearn model.

Now, we build the final pipeline:

=20

# Define = complete MLPipeline lrPipeline =3D c3.MLSerialPipeline( =09name=3D"lrPipeline", =09steps=3D[c3.MLStep(name=3D"preprocess", =09=09=09=09=09 pipe=3DpreprocessPipeline), =09=09 c3.MLStep(name=3D"classifier", =09=09=09=09=09 pipe=3DlogisticRegression)], scoringMetrics=3Dc3.MLScoringMetric.toScoringMetricMap(scoringMetricLis= t=3D[c3.MLAccuracyMetric()]) )=20

This definition looks identical to the `preprocessPipeline`

f=
rom before but for one addition, the '`scoringMetrics`

' paramete=
r. This parameter is for Pipelines that are meant to be trained. Here, we d=
efine the metric on which the ML algorithm should be trained to minimize. I=
n this case, we see the `MLAccuracyMetric`

has been chosen. Othe=
r metrics are available as well.

Execute '`c3ShowType(MLScoringMetric)`

' in the JavaScript con=
sole and look at "Used By" to see other Types which mix in the MLScoringMet=
ric Type.

Once our MLPipeline has been defined we can use the training data from b= efore to train it:

=20

trainedLr= =3D lrPipeline.train(input=3DXTrain, targetOutput=3DyTrain)=20

This produces a **new** Type ``trainedLr`

` which=
contains all of the parameters and model definitions of the trained MLPipe=
line. We can now inspect this model's parameters and evaluate it on new dat=
a:

=20

param_m= ap =3D trainedLr.params() param_map['preprocess__standardScaler__mean_']=20

Might return something like this:

=20

c3.Arry= <double>([5.809166666666665, 3.0616666666666674, 3.726666666666667, 1.1833333333333333])=20

And to evaluate on new data:

=20

predict= ion =3D trainedLr.process(input=3DXTest)=20

We can also score this MLPipeline based on the metric we chose with the =
'`score`

' function:

=20

score = =3D trainedLr.score(input=3DXTest, targetOutput=3DyTest)=20

Once a model is trained, you can store it as a persisted MLSerialPipelin= e Type. You can then retrieve this model later in a different script or dif= ferent component of the C3 AI Suite. Let's look at storing:

=20

upserte= dPipeline =3D trainedLr.upsert()=20

Now ``upsertedPipeline`

` contains an 'id' value of the upsert=
ed MLSerialPipeline object. We can retrieve this object one of two ways:

=20

# Using= the get function of the upsertedPipeline object fetchedPipeline =3D upsertedPipeline.get() pipeline_id =3D upsertedPipeline.id # Using the MLSerialPipeline get function with the id fetchedPipeline =3D c3.MLSerialPipeline.get(pipeline_id)=20

Now you can use ``process`

` on new data with the fetched Pipe=
line!

Using snippets from the DTI created mnistExample, we'll show quickly how= you can use the KerasPipe MLLeafPipe.

Suppose we have already prepared data XTrain and YTrain. We can build a = tensorflow keras model and use a KerasPipe. First, define your model as you= usually would:

=20

import = tensorflow as tf # Build tensorflow model. one_layer_simple_model =3D tf.keras.Sequential([ tf.keras.layers.Reshape((28*28,), input_shape=3D(28,28)), tf.keras.layers.Dense(1000, activation=3D'relu'), tf.keras.layers.Dense(10, activation=3D'softmax') ]) one_layer_simple_model.compile(optimizer=3D'Adam', loss=3D'categorical_cros= sentropy')=20

Then, we create a KerasPipe Type using this model definition using the '=
`upsertNativeModel`

' function. This function creates a new Keras=
Pipe Type using the model definition object passed with the '`model' parameter:`

```
=20
```# Creat=
e the KerasPipe using the upsertNativeModel method
keras_pipe =3D c3.KerasPipe().upsertNativeModel(model=3Dmodel)
# Set the epoch number to an appropriate amount for your model and data.
keras_pipe.technique.numEpochs =3D 10

=20
Then we can train the model as we did with the sklearn model:

=20
trained=
_pipe =3D keras_pipe.train(input=3Dtrain_X, targetOutput=3Dtrain_Y)

=
=20
And finally, we can use the trained pipe with the '`process`

'=
function:

=20
result =
=3D trained_pipe.process(input=3Dtest_X)

=20
# Example Not=
ebooks

Several C3 AI developed Jupyter notebooks exist which demonstrate the us=
age of these Pipeline Types:

- https://<vanity_url>/jupyter/notebooks/tut=
orials/TutorialIntroMLPipeline.ipynb
- https://<vanity_url>/jupyter/notebooks/tut=
orials/TutorialKerasPipe.ipynb
- https://<vanity_url=
>/jupyter/notebooks/tutorials/TutorialMLDataStreams.ipynb<=
/li>
- https://<vanity_url=
>/jupyter/notebooks/tutorials/TutorialProphetPipe.ipynb
- https://<vanity_url=
>/jupyter/notebooks/tutorials/TutorialStatsModelsTsaPipe.ipynb
- https://<vanity_url=
>/jupyter/notebooks/tutorials/TutorialTensorflowPipe.ipynb=
- https://<vanity_url=
>/jupyter/notebooks/tutorials/TutorialXgBoostMLPipeline.ipynb

DTI developed notebooks:

- MNIST example: https://github.com/c3aidti/mnistExampl=
e

Additional Resources

- Developer Documentation

```
------=_Part_18175_1980510221.1635034367305--
```