Source code for modlamp.ml

# -*- coding: utf-8 -*-
"""
.. currentmodule:: modlamp.ml

.. moduleauthor:: modlab Gisela Gabernet ETH Zurich <gisela.gabernet@pharma.ethz.ch>,
                  Alex Mueller ETH Zurich <alex.mueller@pharma.ethz.ch>

This module contains different functions to facilitate machine learning with peptides, mainly making use of the
scikit-learn Python package. Two machine learning models are available, whose parameters can be tuned. For more
information on the machine learning modules please check the `scikit-learn documentation <http://scikit-learn.org>`_.

**The two available machine learning models in this module are:**

=========================   =======================================================================================
Model                       Reference :sup:`[1]`
=========================   =======================================================================================
Support Vector Machine      `sklearn.svm.SVC <http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html>`_
Random Forest               `sklearn.ensemble.RandomForestClassifier <http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html>`_
=========================   =======================================================================================

[1] F. Pedregosa *et al., J. Mach. Learn. Res.* **2011**, 12, 2825–2830.

**The following functions are included in this module:**

============================================    ========================================================================
Function name                                   Description
============================================    ========================================================================
:py:func:`modlamp.ml.train_best_model`          Performs a grid search on different model parameters and returns the
                                                best fitted model and its performance as the cross-validation MCC.
:py:func:`modlamp.ml.plot_validation_curve`     Plotting a validation curve for any parameter from the grid search.
:py:func:`modlamp.ml.predict`                   Predict the class labels or class probabilities for given peptides.
:py:func:`modlamp.ml.score_cv`                  Evaluate the performance of a given model through cross-validation.
:py:func:`modlamp.ml.score_testset`             Evaluate the performance of a given model through test-set prediction.
============================================    ========================================================================


.. versionadded:: 2.2.0
.. versionchanged: 2.7.8
"""

import time

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.model_selection import validation_curve
from sklearn import metrics as mets
from sklearn.metrics import *
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import *
from sklearn.svm import SVC
from sklearn.base import clone

__author__ = "Alex Müller, Gisela Gabernet"
__docformat__ = "restructuredtext en"


[docs]def train_best_model(model, x_train, y_train, sample_weights=None, scaler=StandardScaler(),
                     score=make_scorer(matthews_corrcoef), param_grid=None, n_jobs=-1, cv=10):
    """
    This function performs a parameter grid search on a selected classifier model and peptide training data set.
    It returns a scikit-learn pipeline that performs standard scaling and contains the best model found by the
    grid search according to the Matthews correlation coefficient.
    (see `sklearn.preprocessing <http://scikit-learn.org/stable/modules/preprocessing.html>`_, `sklearn.grid_search
    <http://scikit-learn.org/stable/modules/grid_search.html#exhaustive-grid-search>`_, `sklearn.pipeline.Pipeline
    <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline>`_).
    
    :param model: {str} model to train. Choose between ``'svm'`` (Support Vector Machine) or ``'rf'`` (Random Forest).
    :param x_train: {array} descriptor values for training data.
    :param y_train: {array} class values for training data.
    :param sample_weights: {array} sample weights for training data.
    :param scaler: {scaler} scaler to use in the pipe to scale data prior to training. Choose from
        ``sklearn.preprocessing``, e.g. ``StandardScaler()``, ``MinMaxScaler()``, ``Normalizer()``.
    :param score: {metrics instance} scoring function built from make_scorer() or a predefined value in string form
        (choose from the scikit-learn
        `scoring-parameters <http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter>`_).
    :param param_grid: {dict} parameter grid for the gridsearch (see
        `sklearn.grid_search <http://scikit-learn.org/stable/modules/model_evaluation.html>`_).
    :param n_jobs: {int} number of parallel jobs to use for calculation. if ``-1``, all available cores are used.
    :param cv: {int} number of folds for cross-validation.
    :return: best estimator pipeline.


    **Default parameter grids:**

    ==============        ==============================================================================
    Model                 Parameter grid
    ==============        ==============================================================================
    SVM Model             param_grid = [{'clf__C': param_range, 'clf__kernel': ['linear']},
                          {'clf__C': param_range,    'clf__gamma': param_range, 'clf__kernel': ['rbf']}]

    Random Forest         param_grid = [{'clf__n_estimators': [10, 100, 500],
                                        'clf__max_features': ['sqrt', 'log2'],
                                        'clf__bootstrap': [True],
                                        'clf__criterion': ["gini"]}]
    ==============        ==============================================================================


    **Useful methods implemented in scikit-learn:**

    =================            =============================================================
    Method                       Description
    =================            =============================================================
    fit(x, y)                    fit the model with the same parameters to new training data.
    score(x, y)                  get the score of the model for test data.
    predict(x)                   get predictions for new data.
    predict_proba(x)             get probability predictions for [class0, class1]
    get_params()                 get parameters of the trained model
    =================            =============================================================

    :Example:

    >>> from modlamp.ml import train_best_model
    >>> from modlamp.datasets import load_ACPvsRandom
    >>> from modlamp.descriptors import PeptideDescriptor

    Loading a dataset for training:

    >>> data = load_ACPvsRandom()
    >>> len(data.sequences)
    826
    >>>list(data.target_names)
    ['Random', 'ACP']

    Calculating the pepCATS descriptor values in auto-correlation modality:

    >>> descr = PeptideDescriptor(data.sequences, scalename='pepcats')
    >>> descr.calculate_autocorr(7)
    >>> descr.descriptor.shape
    (826, 42)
    >>> descr.descriptor
    array([[ 1.    ,  0.15      ,  0.        , ...,  0.35714286,  0.21428571,  0.        ],
       [ 0.64      ,  0.12      ,  0.32      , ...,  0.05263158,  0.        ,  0.        ],
       [ 1.        ,  0.23809524,  0.        , ...,  0.53333333,  0.26666667,  0.        ],
       ...,
       [ 0.5       ,  0.22222222,  0.44444444, ...,  0.33333333,  0.        ,  0.        ],
       [ 0.70588235,  0.17647059,  0.23529412, ...,  0.09090909,  0.09090909,  0.        ],
       [ 0.6875    ,  0.1875    ,  0.1875    , ...,  0.2       ,  0.        ,  0.        ]])

    Training an SVM model on this descriptor data:

    >>> X_train = descr.descriptor
    >>> y_train = data.target
    >>> best_svm_model = train_best_model('svm', X_train, y_train)
    Best score (scorer: make_scorer(matthews_corrcoef)) and parameters from a 10-fold cross validation:
    0.739995453978
    {'clf__gamma': 0.1, 'clf__C': 10.0, 'clf__kernel': 'rbf'}

    >>> best_svm_model.get_params()
    {'clf': SVC(C=10.0, cache_size=200, class_weight='balanced', coef0=0.0,
       decision_function_shape=None, degree=3, gamma=0.1, kernel='rbf',
       max_iter=-1, probability=True, random_state=1, shrinking=True, tol=0.001,
       verbose=False),
    ...
    'steps': [('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
      ('clf', SVC(C=10.0, cache_size=200, class_weight='balanced', coef0=0.0,
         decision_function_shape=None, degree=3, gamma=0.1, kernel='rbf',
         max_iter=-1, probability=True, random_state=1, shrinking=True, tol=0.001,
         verbose=False))]}
    """
    print("performing grid search...")
    
    if model.lower() == 'svm':
        pipe_svc = Pipeline([('scl', scaler),
                             ('clf', SVC(class_weight='balanced', random_state=1, probability=True))])
        
        if param_grid is None:
            param_range = [0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]
            param_grid = [{'clf__C': param_range,
                           'clf__kernel': ['linear']},
                          {'clf__C': param_range,
                           'clf__gamma': param_range,
                           'clf__kernel': ['rbf']}]

        gs = GridSearchCV(estimator=pipe_svc,
                          param_grid=param_grid,
                          scoring=score,
                          cv=cv,
                          n_jobs=n_jobs)
        if sample_weights:
            gs.fit(x_train, y_train, [{'clf__sample_weight': sample_weights}])
        else:
            gs.fit(x_train, y_train)
        print("Best score (scorer: %s) and parameters from a %d-fold cross validation:" % (score, cv))
        print("MCC score:\t%.3f" % gs.best_score_)
        print("Parameters:\t%s" % gs.best_params_)
        
        # Set the best parameters to the best estimator
        best_classifier = gs.best_estimator_
        return best_classifier.fit(x_train, y_train)
    
    elif model.lower() == 'rf':
        pipe_rf = Pipeline([('scl', scaler),
                            ('clf', RandomForestClassifier(random_state=1, class_weight='balanced'))])
        
        if param_grid is None:
            param_grid = [{'clf__n_estimators': [10, 100, 500],
                           'clf__max_features': ['sqrt', 'log2'],
                           'clf__bootstrap': [True],
                           'clf__criterion': ["gini"]}]
        
        gs = GridSearchCV(estimator=pipe_rf,
                          param_grid=param_grid,
                          scoring=score,
                          cv=cv,
                          n_jobs=n_jobs)

        if sample_weights:
            gs.fit(x_train, y_train, [{'clf__sample_weight': sample_weights}])
        else:
            gs.fit(x_train, y_train)
        print("Best score (scorer: %s) and parameters from a %d-fold cross validation:" % (score, cv))
        print("MCC score:\t%.3f" % gs.best_score_)
        print("Parameters:\t%s" % gs.best_params_)
        
        # Set the best parameters to the best estimator
        best_classifier = gs.best_estimator_
        return best_classifier.fit(x_train, y_train)
    
    else:
        print("Model not supported, please choose between 'svm' and 'rf'.")


[docs]def plot_validation_curve(classifier, x_train, y_train, param_name, param_range, cv=10, score=make_scorer(
        matthews_corrcoef), title="Validation Curve", xlab="parameter range", ylab="MCC", n_jobs=-1, filename=None):
    """This function plots a cross-validation curve for the specified classifier on all tested parameters given in the
    option ``param_range``.

    :param classifier: {classifier instance} classifier or validation curve (e.g. sklearn.svm.SVC).
    :param x_train: {array} descriptor values for training data.
    :param y_train: {array} class values for training data.
    :param param_name: {string} parameter to assess in the validation curve plot. For SVM,
        "clf__C" (C parameter), "clf__gamma" (gamma parameter). For Random Forest, "clf__n_estimators" (number of trees)
        "clf__max_depth" (max num of branches per tree, "clf__min_samples_split" (min number of samples required to
        split an internal tree node), "clf__min_samples_leaf" (min number of samples in newly created leaf).
    :param param_range: {list} parameter range for the validation curve.
    :param cv: {int} number of folds for cross-validation.
    :param score: {metrics instance} scoring function built from make_scorer() or a predefined value in string form
        `sklearn.model_evaluation.scoring-parameter
        <http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter>`_.
    :param title: {str} graph title
    :param xlab: {str} x axis label.
    :param ylab: {str} y axis label.
    :param n_jobs: {int} number of parallel jobs to use for calculation. if ``-1``, all available cores are used.
    :param filename: {str} if filename given the figure is stored in the specified path.
    :return: plot of the validation curve.

    :Example:

    >>> from modlamp.ml import train_best_model
    >>> from modlamp.datasets import load_ACPvsRandom
    >>> from modlamp.descriptors import PeptideDescriptor

    Loading a dataset for training:

    >>> data = load_ACPvsRandom()
    >>> len(data.sequences)
    826
    >>>list(data.target_names)
    ['Random', 'ACP']

    Calculating the pepCATS descriptor values in auto-correlation modality:

    >>> descr = PeptideDescriptor(data.sequences, scalename='pepcats')
    >>> descr.calculate_autocorr(7)
    >>> descr.descriptor.shape
    (826, 42)
    >>> descr.descriptor
    array([[ 1.    ,  0.15      ,  0.        , ...,  0.35714286,  0.21428571,  0.        ],
       [ 0.64      ,  0.12      ,  0.32      , ...,  0.05263158,  0.        ,  0.        ],
       [ 1.        ,  0.23809524,  0.        , ...,  0.53333333,  0.26666667,  0.        ],
       ...,
       [ 0.5       ,  0.22222222,  0.44444444, ...,  0.33333333,  0.        ,  0.        ],
       [ 0.70588235,  0.17647059,  0.23529412, ...,  0.09090909,  0.09090909,  0.        ],
       [ 0.6875    ,  0.1875    ,  0.1875    , ...,  0.2       ,  0.        ,  0.        ]])

    Training an SVM model with this data:

    >>> X_train = descr.descriptor
    >>> y_train = data.target
    >>> best_svm_model = train_best_model('svm', X_train, y_train)
    Best score (scorer: make_scorer(matthews_corrcoef)) and parameters from a 10-fold cross validation:
    0.739995453978
    {'clf__gamma': 0.1, 'clf__C': 10.0, 'clf__kernel': 'rbf'}

    >>> plot_validation_curve(best_svm_model, X_train, y_train, param_name='clf__gamma',
                              param_range=[0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000])

    .. image:: ../docs/static/validation_curve.png
        :height: 300px

    """
    train_scores, test_scores = validation_curve(classifier, x_train, y_train, param_name, param_range,
                                                 cv=cv, scoring=score, n_jobs=n_jobs)
    train_mean = np.mean(train_scores, axis=1)
    train_std = np.std(train_scores, axis=1)
    test_mean = np.mean(test_scores, axis=1)
    test_std = np.std(test_scores, axis=1)
    
    # plotting
    plt.clf()
    plt.title(title)
    plt.xlabel(xlab)
    plt.ylabel(ylab)
    plt.ylim(0.0, 1.1)
    plt.semilogx(param_range, train_mean, label="Training score", color="b")
    plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, alpha=0.2, color="b")
    plt.semilogx(param_range, test_mean, label="Cross-validation score", color="g")
    plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, alpha=0.2, color="g")
    plt.legend(loc="best")
    
    if filename:
        plt.savefig(filename)
    else:
        plt.show()


[docs]def predict(classifier, x, seqs, names=None, y=None, filename=None):
    """This function can be used to predict novel peptides with a trained classifier model. The function returns a
    ``pandas.DataFrame`` with predictions using the specified estimator and test data. If true class is provided,
    it returns the scoring value for the test data.

    :param classifier: {classifier instance} classifier used for predictions.
    :param x: {array} descriptor values of the peptides to be predicted.
    :param seqs: {list} sequences of the peptides in ``x``.
    :param names: {list} (optional) names of the peptides in ``x``.
    :param y: {array} (optional) true (known) classes of the peptides.
    :param filename: {string} (optional) output filename to store the predictions to (``.csv`` format); if ``None``:
        not saved.
    :return: ``pandas.DataFrame`` containing predictions for ``x``. ``P_class0`` and ``P_class1``
        are the predicted probability of the peptide belonging to class 0 and class 1, respectively.

    :Example:

    >>> from modlamp.ml import train_best_model, predict
    >>> from modlamp.datasets import load_ACPvsRandom
    >>> from modlamp.descriptors import PeptideDescriptor
    >>> from modlamp.sequences import Helices
    
    Loading data for model training:
    
    >>> data = load_ACPvsRandom()

    Calculating descriptor values from the data:
    
    >>> desc = PeptideDescriptor(data.sequences, scalename='pepcats')
    >>> desc.calculate_autocorr(7)
    >>> best_svm_model = train_best_model('svm', desc.descriptor, data.target)

    Generating 10 *de novo* helical sequences to predict their activity:
    
    >>> H = Helices(seqnum=10, lenmin=7, lenmax=30)
    >>> H.generate_sequences()

    Calculating descriptor values for the newly generated sequences:
    
    >>> descH = PeptideDescriptor(H.sequences, scalename='pepcats')
    >>> descH.calculate_autocorr(7)

    >>> df = predict(best_svm_model, x=descH.descriptor, seqs=descH.sequences)
    >>> df.head(3)  # all three shown sequences are predicted active (class 1)
                     Sequence       P_class0        P_class1
    IAGKLAKVGLKIGKIGGKLVKGVLK       0.009167        0.990833
                  LGVRVLRIIIR       0.007239        0.992761
                VGIRLARGVGRIG       0.071436        0.928564

    """
    preds = classifier.predict_proba(x)
    
    if not (y and names):
        d_pred = {'P_class0': preds[:, 0], 'P_class1': preds[:, 1]}
        df_pred = pd.DataFrame(d_pred, index=seqs)
    
    elif not y:
        d_pred = {'Name': names, 'P_class0': preds[:, 0], 'P_class1': preds[:, 1]}
        df_pred = pd.DataFrame(d_pred, index=seqs)
    
    elif not names:
        d_pred = {'P_class0': preds[:, 0], 'P_class1': preds[:, 1], 'True_class': y}
        df_pred = pd.DataFrame(d_pred, index=seqs)
    
    else:
        d_pred = {'Name': names, 'P_class0': preds[:, 0], 'P_class1': preds[:, 1], 'True_class': y}
        df_pred = pd.DataFrame(d_pred, index=seqs)
    
    if filename:
        df_pred.to_csv(filename + time.strftime("-%Y%m%desc-%H%M%S.csv"))
    
    return df_pred


[docs]def score_cv(classifier, x, y, sample_weights=None, cv=10, shuffle=True):
    """This function can be used to evaluate the performance of selected classifier model. It returns the average
    **cross-validation scores** for the specified scoring metrics in a ``pandas.DataFrame``.

    :param classifier: {classifier instance} a classifier model to be evaluated.
    :param x: {array} descriptor values for training data.
    :param y: {array} class values for training data.
    :param sample_weights: {array} weights for training data.
    :param cv: {int} number of folds for cross-validation.
    :param shuffle: {bool} suffle data before making the K-fold split.
    :return: ``pandas.DataFrame`` containing the cross validation scores for the specified metrics.
    :Example:

    >>> from modlamp.ml import train_best_model, score_cv
    >>> from modlamp.datasets import load_ACPvsRandom
    >>> from modlamp.descriptors import PeptideDescriptor
    
    Loading data for model training:
    
    >>> data = load_ACPvsRandom()

    Calculating descriptor values from the data:

    >>> desc = PeptideDescriptor(data.sequences, scalename='pepcats')
    >>> desc.calculate_autocorr(7)
    >>> best_svm_model = train_best_model('svm', desc.descriptor, data.target)

    Get the cross-validation scores:
    
    >>> score_cv(best_svm_model, desc.descriptor, data.target, cv=5)
                   CV_0   CV_1   CV_2   CV_3   CV_4   mean    std
        MCC        0.785  0.904  0.788  0.757  0.735  0.794  0.059
        accuracy   0.892  0.952  0.892  0.880  0.867  0.896  0.029
        precision  0.927  0.974  0.953  0.842  0.884  0.916  0.048
        recall     0.864  0.925  0.854  0.889  0.864  0.879  0.026
        f1         0.894  0.949  0.901  0.865  0.874  0.896  0.029
        roc_auc    0.893  0.951  0.899  0.881  0.868  0.898  0.028
    """

    cv_names = []
    for i in range(cv):
        cv_names.append("CV_%i" % i)

    cv_scores = []

    funcs = ['matthews_corrcoef', 'accuracy_score', 'precision_score', 'recall_score', 'f1_score', 'roc_auc_score',
             'confusion_matrix']
    metrics = ['MCC', 'accuracy', 'precision', 'recall', 'f1', 'roc_auc',
               'TN', 'FP', 'FN', 'TP', 'FDR', 'sensitivity', 'specificity']

    kf = StratifiedKFold(n_splits=cv, random_state=42, shuffle=shuffle)
    clf = clone(classifier)

    for fold_train_index, fold_test_index in kf.split(x, y):
        xcv_train, xcv_test = x[fold_train_index], x[fold_test_index]
        ycv_train, ycv_test = y[fold_train_index], y[fold_test_index]
        scores = []
        if sample_weights is not None:
            weightcv_train, weightcv_test = sample_weights[fold_train_index], sample_weights[fold_test_index]
            clf.fit(xcv_train, ycv_train, sample_weight=weightcv_train)
            for f in funcs:
                scores.append(getattr(mets, f)(ycv_test, clf.predict(xcv_test), sample_weight=weightcv_test))
            tn, fp, fn, tp = scores[-1].ravel()
            scores = scores[:-1] + [tn, fp, fn, tp]
            fdr = float(fp) / float(tp + fp)
            scores.append(fdr)
            sn = float(tp) / float(tp + fn)
            scores.append(sn)
            sp = float(tn) / float(tn + fp)
            scores.append(sp)
        else:
            clf.fit(xcv_train, ycv_train)
            for f in funcs:
                scores.append(getattr(mets, f)(ycv_test, clf.predict(xcv_test)))
            tn, fp, fn, tp = scores[-1].ravel()
            scores = scores[:-1] + [tn, fp, fn, tp]
            fdr = float(fp) / float(tp + fp)
            scores.append(fdr)
            sn = float(tp) / float(tp + fn)
            scores.append(sn)
            sp = float(tn) / float(tn + fp)
            scores.append(sp)
        cv_scores.append(scores)

    dict_scores = dict()
    for colname, score in zip(cv_names, cv_scores):
        dict_scores.update({colname: score})

    df_scores = pd.DataFrame(dict_scores, index=metrics)

    df_scores['mean'] = df_scores.mean(axis=1)
    df_scores['std'] = df_scores.std(axis=1)

    return df_scores.round(2)


[docs]def score_testset(classifier, x_test, y_test, sample_weights=None):
    """ Returns the test set scores for the specified scoring metrics in a ``pandas.DataFrame``. The calculated metrics
    are Matthews correlation coefficient, accuracy, precision, recall, f1 and area under the Receiver-Operator Curve
    (roc_auc). See `sklearn.metrics <http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics>`_
    for more information.

    :param classifier: {classifier instance} pre-trained classifier used for predictions.
    :param x_test: {array} descriptor values of the test data.
    :param y_test: {array} true class values of the test data.
    :param sample_weights: {array} weights for the test data.
    :return: ``pandas.DataFrame`` containing the cross validation scores for the specified metrics.
    :Example:

    >>> from modlamp.ml import train_best_model, score_testset
    >>> from sklearn.model_selection import train_test_split
    >>> from modlamp.datasets import load_ACPvsRandom
    >>> from modlamp import descriptors
    >>> data = load_ACPvsRandom()

    Calculating descriptor values from the data
    
    >>> desc = descriptors.PeptideDescriptor(data.sequences, scalename='pepcats')
    >>> desc.calculate_autocorr(7)

    Splitting data into train and test sets
    
    >>> X_train, X_test, y_train, y_test = train_test_split(desc.descriptor, data.target, test_size = 0.33)

    Training a SVM model on the training set
    
    >>> best_svm_model = train_best_model('svm', X_train,y_train)

    Calculating the scores of the predictions on the test set
    
    >>> score_testset(best_svm_model, X_test, y_test)
       Metrics   Scores
           MCC  0.839
      accuracy  0.920
     precision  0.924
        recall  0.910
            f1  0.917
       roc_auc  0.919
    """
    scores = []
    metrics = ['MCC', 'accuracy', 'precision', 'recall', 'f1', 'roc_auc',
               'TN', 'FP', 'FN', 'TP', 'FDR', 'sensitivity', 'specificity']
    funcs = ['matthews_corrcoef', 'accuracy_score', 'precision_score', 'recall_score', 'f1_score', 'roc_auc_score',
             'confusion_matrix']
    
    for f in funcs:
        # fore every metric, calculate the scores
        scores.append(getattr(mets, f)(y_test, classifier.predict(x_test), sample_weight=sample_weights))
    
    tn, fp, fn, tp = scores.pop().ravel()
    scores = scores + [tn, fp, fn, tp]
    fdr = float(fp) / (tp + fp)
    scores.append(fdr)
    sn = float(tp) / (tp + fn)
    scores.append(sn)
    sp = float(tn) / (tn + fp)
    scores.append(sp)
    df_scores = pd.DataFrame({'Scores': scores}, index=metrics)
    
    return df_scores.round(2)