Effects of actions analysis¶

Author: Andrei Stefan
Date: 13-11-2023
Required files: anonymised_data/anonymised_data_demographic.csv, anonymised_data/anonymised_data_final.csv, anonymised_data/anonymised_data_prescreening.csv
Output files: all files in the data/anonymised_data directory

This file contains the code to generate the files necessary to reproduce the analysis for the effects of the actions. Since all the files contain data about the participants, they are already anonymised.

In [1]:
# import all required packages
import csv
import numpy as np
import os
import pandas as pd

Define functions for splitting the dataset based on answers in the prescreening questionnaire. The files generated in this manner can be used with the effects_of_actions_individual(filename) and effects_of_actions_start_and_end(filename) functions to generate files that further split the dataset per action for confidence and perceived usefulness, to be compared using the R scripts.

In [2]:
def split_on_gender():
    """
    Function to split the samples in the raw data, according to the participants' genders.
    
    Args: none.

    Returns: none.
    """
    # read the demographic data to a dataframe
    df1 = pd.read_csv("../../../data/anonymised_data/anonymised_data_demographic.csv")
    
    # read the state, action, next state, rewards data to a different dataframe
    df2 = pd.read_csv("../../../data/anonymised_data/anonymised_data_final.csv")
    
    # get the ids of participants whose gender is "Man (including Trans Male/Trans Man)"
    df_man = df1[df1["Gender"] == "Man (including Trans Male/Trans Man)"]["Participant id"]
    # get the ids of participants whose gender is "Woman (including Trans Female/Trans Woman)" or "Non-binary (would like to give more detail)"
    df_woman_and_non_binary = df1[df1["Gender"].isin(
        ["Woman (including Trans Female/Trans Woman)", "Non-binary (would like to give more detail)"])][
        "Participant id"]
    
    # select the participants whose gender is "Man (including Trans Male/Trans Man)"
    df_data_man = df2[df2["prolific_id"].isin(df_man)]
    # select the partcipants whose gender is "Woman (including Trans Female/Trans Woman)" or "Non-binary (would like to give more detail)"
    df_data_woman_and_non_binary = df2[df2["prolific_id"].isin(df_woman_and_non_binary)]
    
    # save the dataframes to csv files
    df_data_man.to_csv("../../../data/anonymised_data/anonymised_data_man_2.csv", index=False)
    df_data_woman_and_non_binary.to_csv("../../../data/anonymised_data/anonymised_data_woman_and_non_binary_2.csv", index=False)
In [3]:
def split_on_ttm():
    """
    Function to split the samples in the raw data, according to the participants' Transtheoretical model (TTM) stage.
    
    Args: none.

    Returns: none.
    """
    # read the prescreening data to a dataframe
    df1 = pd.read_csv("../../../data/anonymised_data/anonymised_data_prescreening.csv")
    
    # read the state, action, next state, rewards data to a different dataframe
    df2 = pd.read_csv("../../../data/anonymised_data/anonymised_data_final.csv")
    
    # get the ids of participants whose TTM stage is "Contemplating"
    df_contemplating = df1[df1["TTM stage"] == "Contemplating"]["Prolific id"]
    # get the ids of participants whose TTM stage is "Preparing"
    df_preparing = df1[df1["TTM stage"] == "Preparing"]["Prolific id"]
    
    # select the participants whose TTM stage is "Contemplating"
    df_data_contemplating = df2[df2["prolific_id"].isin(df_contemplating)]
    # select the participants whose TTM stage is "Preparing"
    df_data_preparing = df2[df2["prolific_id"].isin(df_preparing)]
    
    # save the dataframes to csv files
    df_data_contemplating.to_csv("../../../data/anonymised_data/anonymised_data_contemplating_2.csv", index=False)
    df_data_preparing.to_csv("../../../data/anonymised_data/anonymised_data_preparing_2.csv", index=False)
In [4]:
def split_on_godin():
    """
    Function to split the samples in the raw data, according to the participants' scores in the Godin-Shephard leisure-time physical activity questionnaire.
    
    Args: none.

    Returns: none.
    """
    # read the prescreening data to a dataframe
    df1 = pd.read_csv("../../../data/anonymised_data/anonymised_data_prescreening.csv")
    
    # read the state, action, next state, rewards data to a different dataframe
    df2 = pd.read_csv("../../../data/anonymised_data/anonymised_data_final.csv")
    
    # get the ids of participants whose godin score is less than 14 (corresponding to "Inactive/Sedentary")
    df_sedentary = df1[df1["Godin score"] < 14]["Prolific id"]
    # get the ids of participants whose godin score is more than 14 (corresponding to "Moderately active")
    df_moderate = df1[14 <= df1["Godin score"]]["Prolific id"]
    
    # select the participants who were "Inactive/Sedentary"
    df_data_sedentary = df2[df2["prolific_id"].isin(df_sedentary)]
    # select the participants who were "Moderately active"
    df_data_moderate = df2[df2["prolific_id"].isin(df_moderate)]
    
    # save the dataframes to csv files
    df_data_sedentary.to_csv("../../../data/anonymised_data/anonymised_data_sedentary_2.csv", index=False)
    df_data_moderate.to_csv("../../../data/anonymised_data/anonymised_data_moderate_2.csv", index=False)
In [5]:
def split_on_confidence_before():
    """
    Function to split the samples in the raw data, according to the participants' confidence in the pre-screening questionnaire.
    
    Args: none.

    Returns: none.
    """
    # read the prescreening data to a dataframe
    df1 = pd.read_csv("../../../data/anonymised_data/anonymised_data_prescreening.csv")
    
    # read the state, action, next state, rewards data to a different dataframe
    df2 = pd.read_csv("../../../data/anonymised_data/anonymised_data_final.csv")

    # get the ids of participants whose confidence is less than 6 (the lowest 50th percentile)
    df_low_confidence = df1[df1["Confidence"] < 6]["Prolific id"]
    # get the ids of participants whose confidence is 6 or more (the highest 50th percentile)
    df_high_confidence = df1[df1["Confidence"] >= 6]["Prolific id"]
    
    # select the participants whose confidence is in the lowest 50th percentile
    df_data_low_confidence = df2[df2["prolific_id"].isin(df_low_confidence)]
    # select the participants whose confidence is in the highest 50th percentile
    df_data_high_confidence = df2[df2["prolific_id"].isin(df_high_confidence)]
    
    # save the dataframes to csv files
    df_data_low_confidence.to_csv("../../../data/anonymised_data/anonymised_data_low_confidence_2.csv", index=False)
    df_data_high_confidence.to_csv("../../../data/anonymised_data/anonymised_data_high_confidence_2.csv", index=False)
In [6]:
def split_on_perceived_usefulness_before():
    """
    Function to split the samples in the raw data, according to the participants' perceived usefulness in the pre-screening questionnaire.
    
    Args: none.

    Returns: none.
    """
    # read the prescreening data to a dataframe
    df1 = pd.read_csv("../../../data/anonymised_data/anonymised_data_prescreening.csv")

    # read the state, action, next state, rewards data to a different dataframe
    df2 = pd.read_csv("../../../data/anonymised_data/anonymised_data_final.csv")

    # get the ids of participants whose perceived usefulness is less than 7 (the lowest 50th percentile)
    df_low_perceived_usefulness = df1[df1["Perceived usefulness"] < 7]["Prolific id"]
    # get the ids of participants whose perceived usefulness 7 or more (the highest 50th percentile)
    df_high_perceived_usefulness = df1[df1["Perceived usefulness"] >= 7]["Prolific id"]

    # select the participants whose perceived usefulness is in the lowest 50th percentile
    df_data_low_perceived_usefulness = df2[df2["prolific_id"].isin(df_low_perceived_usefulness)]
    # select the participants whose perceived usefulness is in the highest 50th percentile
    df_data_high_perceived_usefulness = df2[df2["prolific_id"].isin(df_high_perceived_usefulness)]
    
    # save the dataframes to csv files
    df_data_low_perceived_usefulness.to_csv("../../../data/anonymised_data/anonymised_data_low_perceived_usefulness_2.csv", index=False)
    df_data_high_perceived_usefulness.to_csv("../../../data/anonymised_data/anonymised_data_high_perceived_usefulness_2.csv", index=False)
In [7]:
def split_on_attitude_before():
    """
    Function to split the samples in the raw data, according to the participants' attitude in the pre-screening questionnaire.
    
    Args: none.

    Returns: none.
    """
    # read the prescreening data to a dataframe
    df1 = pd.read_csv("../../../data/anonymised_data/anonymised_data_prescreening.csv")
    
    # read the state, action, next state, rewards data to a different dataframe
    df2 = pd.read_csv("../../../data/anonymised_data/anonymised_data_final.csv")
    
    # get the ids of participants whose attitude is less than 8 (the lowest 50th percentile)
    df_low_attitude = df1[df1["Attitude"] < 8]["Prolific id"]
    # get the ids of participants whose attitude is 8 or more (the highest 50th percentile)
    df_high_attitude = df1[df1["Attitude"] >= 8]["Prolific id"]
    
    # select the participants whose attitude is in the lowest 50th percentile
    df_data_low_attitude = df2[df2["prolific_id"].isin(df_low_attitude)]
    # select the participants whose attitude is in the highest 50th percentile
    df_data_high_attitude = df2[df2["prolific_id"].isin(df_high_attitude)]
    
    # save the dataframes to csv files
    df_data_low_attitude.to_csv("../../../data/anonymised_data/anonymised_data_low_attitude_2.csv", index=False)
    df_data_high_attitude.to_csv("../../../data/anonymised_data/anonymised_data_high_attitude_2.csv", index=False)

Define functions for splitting the dataset per action and per feature.

In [8]:
def effects_of_actions_individual(filename):
    """
    Function to create individual files which contain the data before and after using an action, for each action, and for each feature.
    
    Args: filename - the name of the file used as input.

    Returns: none.
    """
    
    # define lists for each feature (c - confidence, and pu -perceived usefulness),
    # for each action (changes to plan 1, changes to plan 2, explain planning, identify barriers, deal with barriers, and show testimonials,
    # and for before and after the action was done
    c_changes_to_plan_1_before = []
    c_changes_to_plan_1_after = []

    c_changes_to_plan_2_before = []
    c_changes_to_plan_2_after = []

    c_explain_planning_before = []
    c_explain_planning_after = []

    c_identify_barriers_before = []
    c_identify_barriers_after = []

    c_deal_with_barriers_before = []
    c_deal_with_barriers_after = []

    c_show_testimonials_before = []
    c_show_testimonials_after = []

    pu_changes_to_plan_1_before = []
    pu_changes_to_plan_1_after = []

    pu_changes_to_plan_2_before = []
    pu_changes_to_plan_2_after = []

    pu_explain_planning_before = []
    pu_explain_planning_after = []

    pu_identify_barriers_before = []
    pu_identify_barriers_after = []

    pu_deal_with_barriers_before = []
    pu_deal_with_barriers_after = []

    pu_show_testimonials_before = []
    pu_show_testimonials_after = []
    
    # open the file
    with open(filename) as f:
        # read all lines in the file
        lines = f.readlines()
        
        # keep track of the previous id
        # initialised as empty for the first sample
        old_id = ""
        
        # also keep track of how many changes to the plans were done, 
        # to know if the data should be placed in the changes_to_plan_1 or changes_to_plan_2 list
        changes_done = 0
        
        # loop over all lines in the file, skipping the first one which has the column names
        for line in lines[1:]:
            
            # get the current id
            current_id = line.split(",")[0]
            
            # split the line
            split = line.split(",\"")
            
            # check if the current id is the same as the old one to see if we have finished processing a person's samples
            if current_id != old_id:
                # if we are proecssing a new person, set changes done to plans to 0 and the old id as the current one
                changes_done = 0
                old_id = current_id
            
            # get state before, action, state after data from the line
            state_before = split[1].split("\",")[0]
            action = split[1].split("\",")[1]
            state_after = split[2].split("\",")[0]
            
            # split the state before
            split_before = state_before.split(", ")
            
            # get its confidence and perceived usefulness
            c_before = split_before[1][1:-1]
            pu_before = split_before[2][1:-1]
            
            # split the state after
            split_after = state_after.split(", ")
            
            # get its confidence and perceived usefulness
            c_after = split_after[1][1:-1]
            pu_after = split_after[2][1:-1]
            
            # depending on the action, add the confidence and perceived usefulness before and after the action to the corresponding lists
            if action == "explain_planning":
                c_explain_planning_before.append(c_before)
                pu_explain_planning_before.append(pu_before)

                c_explain_planning_after.append(c_after)
                pu_explain_planning_after.append(pu_after)

            if action == "identify_barriers":
                c_identify_barriers_before.append(c_before)
                pu_identify_barriers_before.append(pu_before)

                c_identify_barriers_after.append(c_after)
                pu_identify_barriers_after.append(pu_after)

            if action == "deal_with_barriers":
                c_deal_with_barriers_before.append(c_before)
                pu_deal_with_barriers_before.append(pu_before)

                c_deal_with_barriers_after.append(c_after)
                pu_deal_with_barriers_after.append(pu_after)

            if action == "show_testimonials":
                c_show_testimonials_before.append(c_before)
                pu_show_testimonials_before.append(pu_before)

                c_show_testimonials_after.append(c_after)
                pu_show_testimonials_after.append(pu_after)
            
            # when the action is changes to plan, we need to check if it is the first or the second change
            if action == "changes_to_plan":
                
                # using the value of the changes_done variable, we can determine this
                if changes_done == 0:
                    c_changes_to_plan_1_before.append(c_before)
                    pu_changes_to_plan_1_before.append(pu_before)

                    c_changes_to_plan_1_after.append(c_after)
                    pu_changes_to_plan_1_after.append(pu_after)

                    changes_done += 1
                elif changes_done == 1:
                    c_changes_to_plan_2_before.append(c_before)
                    pu_changes_to_plan_2_before.append(pu_before)

                    c_changes_to_plan_2_after.append(c_after)
                    pu_changes_to_plan_2_after.append(pu_after)

                    changes_done += 1
    
    # save all the lists to csv files
    np.savetxt("../../../data/c_changes_to_plan_1_before.csv", c_changes_to_plan_1_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/c_changes_to_plan_1_after.csv", c_changes_to_plan_1_after, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_changes_to_plan_1_before.csv", pu_changes_to_plan_1_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_changes_to_plan_1_after.csv", pu_changes_to_plan_1_after, delimiter=", ", fmt='% s')

    np.savetxt("../../../data/c_changes_to_plan_2_before.csv", c_changes_to_plan_2_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/c_changes_to_plan_2_after.csv", c_changes_to_plan_2_after, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_changes_to_plan_2_before.csv", pu_changes_to_plan_2_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_changes_to_plan_2_after.csv", pu_changes_to_plan_2_after, delimiter=", ", fmt='% s')

    np.savetxt("../../../data/c_explain_planning_before.csv", c_explain_planning_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/c_explain_planning_after.csv", c_explain_planning_after, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_explain_planning_before.csv", pu_explain_planning_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_explain_planning_after.csv", pu_explain_planning_after, delimiter=", ", fmt='% s')

    np.savetxt("../../../data/c_identify_barriers_before.csv", c_identify_barriers_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/c_identify_barriers_after.csv", c_identify_barriers_after, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_identify_barriers_before.csv", pu_identify_barriers_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_identify_barriers_after.csv", pu_identify_barriers_after, delimiter=", ", fmt='% s')

    np.savetxt("../../../data/c_deal_with_barriers_before.csv", c_deal_with_barriers_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/c_deal_with_barriers_after.csv", c_deal_with_barriers_after, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_deal_with_barriers_before.csv", pu_deal_with_barriers_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_deal_with_barriers_after.csv", pu_deal_with_barriers_after, delimiter=", ", fmt='% s')

    np.savetxt("../../../data/c_show_testimonials_before.csv", c_show_testimonials_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/c_show_testimonials_after.csv", c_show_testimonials_after, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_show_testimonials_before.csv", pu_show_testimonials_before, delimiter=", ", fmt='% s')
    np.savetxt("../../../data/pu_show_testimonials_after.csv", pu_show_testimonials_after, delimiter=", ", fmt='% s')


def effects_of_actions_start_and_end(filename):
    """
    Function to create individual files which contain the data before and after the entire interaction, for each feature.
    
    Args: filename - the name of the file used as input.

    Returns: none.
    """
    
    # create lists for confidence before and after, and for perceived usefulness before and after the interaction
    c_before_list = []
    c_after_list = []

    pu_before_list = []
    pu_after_list = []
    
    # open the file
    with open(filename) as f:
        # read all lines
        lines = f.readlines()
        
        # keep a variable which indicates if we are processing the first sample of a person
        got_start = False
        # keep a variable which indicates if we have reached the end of the last existing sample
        ending = False
        
        # loop over all lines in the file, skipping the first one which has the column names
        for index, line in enumerate(lines[1:]):
            
            # check if index+2 is still a valid index for a line in the list of lines
            # this is essentially checking if a next line still exists
            # normally, the index of the next line would be index+1, but since we skip the first line and enumerate starts from 0, 
            # the location of the current line in the lines list is index+1 so the one after that is index+2
            if index + 2 < len(lines):
                # if there is still a next line after
                # get the current id
                current_id = line.split(",")[0]
                # get the next id
                next_id = lines[index+2].split(",")[0]
            else:
                # if there is no line after this one, then we are processing the last sample
                ending = True
            
            # split the line
            split = line.split(",\"")
            
            # get the state before
            state_before = split[1].split("\",")[0]
            
            # get the state after
            state_after = split[2].split("\",")[0]
            
            # split the state before
            split_before = state_before.split(", ")
            
            # get the confidence and perceived usefulness before
            c_before = split_before[1][1:-1]
            pu_before = split_before[2][1:-1]
            
            # split the state after
            split_after = state_after.split(", ")
            
            # get the confidence and perceived usefulness after
            c_after = split_after[1][1:-1]
            pu_after = split_after[2][1:-1]
            
            # got_start is False when we are processing the first sample of a person
            if not got_start:
                # then the sample has to be added to the before list for confidence and perceived usefulness
                c_before_list.append(c_before)
                pu_before_list.append(pu_before)
                # set the got_start variable to true to make sure we skip other sample from this person
                got_start = True
            
            # if the next id is different from the current one, then we are processing the last sample of a person
            # alternatively, if there is no next id, then the ending variable is true, meaning we are processing the last sample of the last person
            if current_id != next_id or ending:
                # in either case, we need to add the sample to the after list for confidence and perceived usefulness
                c_after_list.append(c_after)
                pu_after_list.append(pu_after)
                # we set got_start to false again, since the next sample is one from a new person and we need to add it to the before list
                got_start = False
        
        # save the lists to csv files
        np.savetxt("../../../data/c_before.csv", c_before_list, delimiter=", ", fmt='% s')
        np.savetxt("../../../data/c_after.csv", c_after_list, delimiter=", ", fmt='% s')
        np.savetxt("../../../data/pu_before.csv", pu_before_list, delimiter=", ", fmt='% s')
        np.savetxt("../../../data/pu_after.csv", pu_after_list, delimiter=", ", fmt='% s')

Obtaining the results¶

The effects_of_actions_individual(filename) and effects_of_actions_start_end(filename) functions can take as input the entire dataset. For example:

In [9]:
effects_of_actions_individual("../../../data/anonymised_data/anonymised_data_final.csv")

The files generated are placed in the data folder. To check the effects of the actions, use the R script for the corresponding feature (confidence or perceived usefulness).

They can also take as input the dataset that was split on some prescreening attributes. Note that running the function overwrites the files that already exist in the data folder, so to check the effects for each of the two splits of the dataset, run the effects_of_actions_individual(filename) function on the first split, then run the R script for the feature you are interested in, and then run the effects_of_actions_individual(filename) for the other split, and then run the R script. An example for checking the effects of low and high attitude is shown below.

In [10]:
effects_of_actions_individual("../../../data/anonymised_data/anonymised_data_low_attitude.csv")

Now, the files in the data folder contain the data for only the low attitude samples. At this point, you should run the R script (e.g. bayesian_t_test_confidence.R) to check if individual actions have an effect for people whose attitude was low in the pre-screening. Once you are done checking, you can then run the effects_of_actions_individual(filename) on the samples for which attitude was high.

In [11]:
effects_of_actions_individual("../../../data/anonymised_data/anonymised_data_high_attitude.csv")

Now, the files in the data folder contain the data for only the high attitude samples. At this point, you should run the R script (e.g. bayesian_t_test_confidence.R) to check if individual actions have an effect for people whose attitude was high in the pre-screening.

To ensure that, when running the R scripts, you get similar results to the ones reported in the manuscript, make sure you run the cell below before trying to replicate the results.

In [12]:
effects_of_actions_individual("../../../data/anonymised_data/anonymised_data_final.csv")

The files for all the splits that the functions above can produce are already placed in the data folder. It is possible to run the functions again, which will produce a file with the suffix _2, which can be used to compare against the file provided to make sure that the results are correct. The cell below does that.

In [13]:
split_on_gender()
split_on_ttm()
split_on_godin()
split_on_confidence_before()
split_on_perceived_usefulness_before()
split_on_attitude_before()