Writing better Machine Learning evaluation protocols (Regression)

Use this to truly make your ML Pipelines crisp

4 min readJan 13, 2022

When you’re training machine learning models, one of the most important steps you can take in your models is to decide the correct evaluation protocols. How you choose to evaluate model performance can change the outcomes of your learning algorithms. When it comes to regression, most people tend to stick to either MSE or RMSE. These are great algorithms but they are not without their flaws. There is also the popular R² metric, which does a great job at allowing us to compare models (and Adjusted R², it modified brother). But that’s not all. Sometimes in Machine Learning, you might end up writing your own regression metrics.

An example of all the different error metrics available to us w/ just sklearn

Many of these are very similar to each other. However, metrics such as Poisson deviance are their own thing. And having them to compare might provide you with some very interesting insights. I’ve worked with runs where one model had better R²/MSE but worse Poisson Error. Going with only error would not have told you the full story.

By the end of this article, you will be able to generate similar-looking reports for your comparison.

Both tables were generated by using the same dataset. The only difference is the models themselves. Seeing the results, which would you choose? Think about that, and share in the comments below.

Now that you see the world beyond the standard MSE/RMSE/R², read on. In this article, I will share the code of how you can generate reports like this. You can copy-paste the code I share, but I would recommend also thinking of which metric you should use when, and how you can improve upon the basic body I share here.

Set up the basics

You obviously want to start by setting up the basics. We do so by doing the following:

names=["RFR","Lasso","ElasticNet","KNR"]models=[RandomForestRegressor(),Lasso(),ElasticNet(),neighbors.KNeighborsRegressor()]bestScore=100000scoring = ['neg_mean_squared_error','explained_variance','max_error','neg_mean_absolute_error','neg_root_mean_squared_error','neg_mean_squared_log_error','neg_median_absolute_error','r2','neg_mean_poisson_deviance','neg_mean_gamma_deviance']

Notice that instead of hardcoding the values, we keep them in variables. This solves a few purposes. One this makes changing code later much easier. Want to add a new model/ remove a weak one? Just tweak models and names list. Want to add in your own funky metrics/metrics from other sources? Change your scoring list. Secondly, it makes your code shorter. Instead of typing out 20 models, I can refer to them as a list. As you have to reuse your analysis, this convenience really adds up.

Write your error function

Now that we have our components set up, it is time to write our error calculation function.

def error_metrics(i, train_data, train_targ, kfold):
    """
    Compute all the relevant metrics. 
    @params
    @i-model number
    @train_data: X values (predictors)
    @train_targ: predictions (y vals)
    @kfold: k value for cross val"""
    
    model=[models[i]]
    error_metrics = pd.DataFrame()
    # define the function used to evaluate a given configuration
    for scor in scoring:
        score = []
        for mod in model:
            result=""
            try: 
              result = model_selection.cross_val_score(estimator= mod, X=train_data, y=train_targ,cv=kfold,scoring=scor )
              score.append(result.mean())
            except Exception:
              score.append("Error not Applicable")error_metrics[scor] =pd.Series(score)
        
    return error_metrics

Most of this is relatively straightforward. But right now we don’t have any support for custom error metrics. How do we do that? Look at the try-except section. Our custom/non-scikit errors will cause an exception when we try the cross-validate portion. We could add an additional check. If the scor is one of our special metrics, we will add a contingency to run the calculations. If you want the best way to implement this, reach out to me. But I would recommend working on it by yourself first. It’s good practice. You can always reach out to me for review, pointers, and discussing it. Happy coding.

Writing such functions/ideas in my pipelines are a huge part of my work

If you liked this article, check out my other content. I post regularly on Medium, YouTube, Twitter, and Substack (all linked below). I focus on Artificial Intelligence, Machine Learning, Technology, and Software Development. If you’re preparing for coding interviews check out: Coding Interviews Made Simple, my free weekly newsletter. Feel free to reach out if you have any interesting projects/ideas for me as well.

For monetary support of my work following are my Venmo and Paypal. Any amount is appreciated and helps a lot. Donations unlock exclusive content such as paper analysis, special code, consultations, and reduced rates for mock interviews:

Venmo: https://account.venmo.com/u/FNU-Devansh

Paypal: paypal.me/ISeeThings

Reach out to me

If that article got you interested in reaching out to me, then this section is for you. You can reach out to me on any of the platforms, or check out any of my other content. If you’d like to discuss tutoring, text me on LinkedIn, IG, or Twitter. If you’d like to support my work, use my free Robinhood referral link. We both get a free stock, and there is no risk to you. So not using it is just losing free money.

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

If you’re preparing for coding/technical interviews: https://codinginterviewsmadesimple.substack.com/

Get a free stock on Robinhood: https://join.robinhood.com/fnud75