The Only Guide You’ll Ever Need to Understand Core Of PMML.

The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Dr. Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago. PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and other feedforward neural networks. Version 0.9 was published in 1998.[1] Subsequent versions have been developed by the Data Mining Group.[2]

For understanding, I have taken a PMML example from Data Mining Group Tree Model.

I am using IntelliJ as my IDE:

The is how you should save your namespace in IntelliJ, it will help in auto-filling and autocorrection.

This is an overview of variable scoping in PMML:

Lets Dissecting PMML:

<PMML version="4.3" xmlns="http://www.dmg.org/PMML-4_3">

PMML is the root tag(starting tag) in the XML, the version attribute of the PMML tag specifies the current version of PMML that are using. Here, we are using version 4.3 of PMML

The various other versions of PMML are:

  1. v.4.4
  2. v.4.3
  3. v.4.2.1
  4. v.4.1
  5. v.4.0.1
  6. v.3.2
  7. v.3.1
  8. v.3.0
  9. v.2.1
  10. v.2.0
  11. v.1.1
xmlns=”http://www.dmg.org/PMML-4_3" 

This specifies the source of the namespace of PMML we are using.

HEADER:

The header gives a description of the PMML

 <Header copyright=”KNIME”>
<Application name=”KNIME” version=”2.8.0"/>
</Header>

DATA DICTIONARY:

Data dictionary contain information about the field their types and values they can have.

<DataDictionary numberOfFields=”5">      <DataField name=”sepal_length” optype=”continuous”  
dataType=”double”>
<Interval closure=”closedClosed” leftMargin=”4.3"
rightMargin=”7.9"/>
</DataField>

<DataField name=”sepal_width” optype=”continuous”
dataType=”double”>
<Interval closure=”closedClosed” leftMargin=”2.0"
rightMargin=”4.4"/>
</DataField>
<DataField name=”petal_length” optype=”continuous”
dataType=”double”>
<Interval closure=”closedClosed” leftMargin=”1.0"
rightMargin=”6.9"/>
</DataField>
<DataField name=”petal_width” optype=”continuous”
dataType=”double”>
<Interval closure=”closedClosed” leftMargin=”0.1"
rightMargin=”2.5"/>
</DataField>
<DataField name=”class” optype=”categorical” dataType=”string”>
<Value value=”Iris-setosa”/>
<Value value=”Iris-versicolor”/>
<Value value=”Iris-virginica”/>
</DataField>
</DataDictionary>

Model:

<! — ACTUAL MODEL →
<TreeModel modelName=”DecisionTree” functionName=”classification” splitCharacteristic=”binarySplit”
missingValueStrategy=”lastPrediction” noTrueChildStrategy=”returnNullPrediction”>

Mining Schema:

Mining Schema defines the field on which the model will operate on a simple model with many inputs and one or more outputs.

<! — MINING SCHEMA SPECIFIES WHAT ARE DECISION MAKING AUTHORITIES. IT IS A GATE KEEPER THROUGH WHICH ALL THE DATA →
<! — ARE PASSED. →
<MiningSchema>
<MiningField name=”sepal_length” usageType=”active” invalidValueTreatment=”asIs”/>
<MiningField name=”sepal_width” usageType=”active” invalidValueTreatment=”asIs”/>
<MiningField name=”petal_length” usageType=”active” invalidValueTreatment=”asIs”/>
<MiningField name=”petal_width” usageType=”active” invalidValueTreatment=”asIs”/>
<MiningField name=”class” invalidValueTreatment=”asIs” usageType=”predicted”/>
</MiningSchema>

NODE: The structure may change from model to model in the case of the tree model it contains nodes.

<Node id=”0" score=”Iris-setosa” recordCount=”150.0">
<! —
THE BASE NODE IS ALWAYS EVALUATED, SO TRUE.
Every Node contains a predicate SimplePredicate that identifies a rule for choosing itself or any of its siblings.
A predicate may be an expression composed of other nested predicates.
The value of recordCount in a Node serves as a base size for recordCount values in ScoreDistribution elements.

<True/>
<! —
scoreDistribution: an element of Node represent segments of the score that a Node predicts in a classification mode
value: This attribute of ScoreDistribution is the label in a classification model.
recordCount: This attribute of ScoreDistribution is the size (in number of records) associated with the value attribute.

<ScoreDistribution value=”Iris-setosa” recordCount=”50.0"/>
<ScoreDistribution value=”Iris-versicolor” recordCount=”50.0"/>
<ScoreDistribution value=”Iris-virginica” recordCount=”50.0"/>
<Node id=”1" score=”Iris-setosa” recordCount=”50.0">
<! — if Value of petal_width <= 0.6,
execute Node with id : 1
score of Iris-setosa contributes to 50 of Node1 which has recordCount = 50 means 100%

<SimplePredicate field=”petal_width” operator=”lessOrEqual” value=”0.6"/>
<ScoreDistribution value=”Iris-setosa” recordCount=”50.0"/>
<ScoreDistribution value=”Iris-versicolor” recordCount=”0.0"/>
<ScoreDistribution value=”Iris-virginica” recordCount=”0.0"/>
</Node>
<Node id=”2" score=”Iris-versicolor” recordCount=”100.0">
<SimplePredicate field=”petal_width” operator=”greaterThan” value=”0.6"/>
<ScoreDistribution value=”Iris-setosa” recordCount=”0.0"/>
<ScoreDistribution value=”Iris-versicolor” recordCount=”50.0"/>
<ScoreDistribution value=”Iris-virginica” recordCount=”50.0"/>
<Node id=”3" score=”Iris-versicolor” recordCount=”54.0">
<SimplePredicate field=”petal_width” operator=”lessOrEqual” value=”1.7"/>
<ScoreDistribution value=”Iris-setosa” recordCount=”0.0"/>
<ScoreDistribution value=”Iris-versicolor” recordCount=”49.0"/>
<ScoreDistribution value=”Iris-virginica” recordCount=”5.0"/>
</Node>
<Node id=”10" score=”Iris-virginica” recordCount=”46.0">
<SimplePredicate field=”petal_width” operator=”greaterThan” value=”1.7"/>
<ScoreDistribution value=”Iris-setosa” recordCount=”0.0"/>
<ScoreDistribution value=”Iris-versicolor” recordCount=”1.0"/>
<ScoreDistribution value=”Iris-virginica” recordCount=”45.0"/>
</Node>
</Node>
</Node>

Model Verification:

<ModelVerification>
<! — VerificationFields element contains the fields that will appear in the verification records. →
<VerificationFields>
<VerificationField field=”sepal_length”/>
<VerificationField field=”sepal_width”/>
<VerificationField field=”petal_length”/>
<VerificationField field=”petal_width”/>
<VerificationField field=”class”/>
</VerificationFields>
<InlineTable>
<row>
<sepal_length>4.2</sepal_length>
<sepal_width>3.2</sepal_width>
<petal_length>3.2</petal_length>
<petal_width>2</petal_width>
<class>Iris-virginica</class>
</row><row>
<sepal_length>7.9</sepal_length>
<sepal_width>4.4</sepal_width>
<petal_length>6.9</petal_length>
<petal_width>4</petal_width>
<class>Iris-setosa</class>
</row>
</InlineTable>
</ModelVerification>
</TreeModel>
</PMML>

Code on Github: PMML

Jr. Software Engineer working currently on Java 8.