Author: Andrei Stefan
Date: 13-11-2023
Required files: data/thematic_analysis_demotivating.csv, data/thematic_analysis_motivating.csv
Output files: no output files
This file contains the code to reproduce the results for the thematic analysis. The corresponding figures and tables are: Table I.2, Table I.3, Figure I.1.
# import all required packages
import pandas as pd
from sklearn.metrics import cohen_kappa_score
In the thematic analysis, we looked at open-ended answers to questions about what people found motivating and demotivating about talking to the virtual coach. Based on these answers, 10 codes are created for the motivating aspects, and 9 codes are created for the demotivating aspects. A second coder was asked to additionally code the open-ended answers, after being trained on a few examples. The function below computes agreement between the two coders through Cohen's Kappa and Brennan-Prediger's Kappa. Additionally, it counts how many times the original coder assigned a code in the message to help determine which codes should be the focus of the analysis and which are more scarce, and thus secondary.
def disagreement():
"""
Function to calculate how many disagreements there were between the two coders for the training data.
Args: none.
Returns: none.
"""
# read the "training examples" sheet of the excel file for the motivating examples as a dataframe
df_motivating = pd.read_excel("../../../data/thematic_analysis_motivating_examples_for_second_coder.xlsx", "training examples")
# get the original codes as a list
original_motivating = df_motivating['Original code numbers'].tolist()
# get the second coder's codes as a list
second_coder_motivating = df_motivating['Code numbers'].tolist()
# initialise number of agreed examples to 0
agreements = 0
# loop over the original and second coder's codes
for (original, second_coder) in zip (original_motivating, second_coder_motivating):
# if the codes are the same
if original == second_coder:
# count an agreement
agreements +=1
# print the result
print(f"For the motivating factors, the two coders disagreed on {len(original_motivating) - agreements} responses out of {len(original_motivating)} responses total.")
# read the "training examples" sheet of the excel file for the demotivating examples as a dataframe
df_demotivating = pd.read_excel("../../../data/thematic_analysis_demotivating_examples_for_second_coder.xlsx", "training examples")
# get the original codes as a list
original_demotivating = df_demotivating['Original code numbers'].tolist()
# get the second coder's codes as a list
second_coder_demotivating = df_demotivating['Code numbers'].tolist()
# initialise number of agreed examples to 0
agreements = 0
# loop over the original and second coder's codes
for (original, second_coder) in zip (original_demotivating, second_coder_demotivating):
# if the codes are the same
if original == second_coder:
# count an agreement
agreements +=1
# print the result
print(f"For the demotivating factors, the two coders disagreed on {len(original_motivating) - agreements} responses out of {len(original_motivating)} responses total.")
disagreement()
For the motivating factors, the two coders disagreed on 2 responses out of 12 responses total. For the demotivating factors, the two coders disagreed on 0 responses out of 12 responses total.
def thematic_analysis(filename, number_of_codes):
"""
Function to make the computations necessary for the thematic analysis.
Args: filename - the name of the file containing the data,
number_of_codes - the number of codes that were used when coding the data.
Returns: none.
"""
# read the file into a dataframe
df = pd.read_csv(filename)
# get the original codes as a list
original = df['Original code numbers'].tolist()
# get the second conder's codes as a list
second_coder = df['Code numbers'].tolist()
# initialise an empty dict that will count how many times each code appears
code_count = {}
# start the count for each possible code at 0
for i in range(1, number_of_codes+1):
code_count[i] = 0
# read the file needed for counting the codes into a dataframe
df_count = pd.read_csv(f"{filename[:-4]}_full.csv")
# get the original coding of all the responses (the list called original only has the responses which both coders coded,
# while this one also contains the responses used for training and testing the second coder)
codes_to_count = df_count['Original code numbers'].tolist()
# initialise an empty list which will hold the codes that should be kept in the final analysis
codes_to_keep = []
# initialise 2 empty lists which will hold all the codes that the two coders assigned,
# but as zeroes and ones, indicating if a specific code applies for the respective message or not
coder_1_all = []
coder_2_all = []
# loop over all possible codes
for i in range(1, number_of_codes+1):
# loop over the list with all responses
for coding in codes_to_count:
# if there are multiple codes for a response
if "," in coding:
# then split them up
codes = coding.split(", ")
# otherwise, there is only one code so keep the original
else:
codes = coding
# if the current code we are checking is in the list of assigned codes
if str(i) in codes:
# then increment the count of the current code once
code_count[i] += 1
# initialise two empty lists which will hold zeroes and ones that indicate
# whether or not the current code we are checking is present in each response, according to each coder
coder_1 = []
coder_2 = []
# loop over the two coders' codings
for coding_original, coding_second in zip(original, second_coder):
# if there are multiple codes
if "," in coding_original:
# then split them
codes_original = coding_original.split(", ")
# otherwise, there is only one code so keep the original
else:
codes_original = coding_original
# if the current code we are checking is in the list of assigned codes
if str(i) in codes_original:
# then increment append a 1 to the list
coder_1.append(1)
# otherwise, append a 0 to the list
else:
coder_1.append(0)
# if there are multiple codes
if "," in coding_second:
codes_second = coding_second.split(", ")
# otherwise, there is only one code so keep the original
else:
codes_second = coding_second
# if the current code we are checking is in the list of assigned codes
if str(i) in codes_second:
# then increment append a 1 to the list
coder_2.append(1)
# otherwise, append a 0 to the list
else:
coder_2.append(0)
# initialise the number of agreements to 0
percent_agreement_i = 0
# count how many codes the two coders agree on
for (item_original, item_second) in zip(coder_1, coder_2):
percent_agreement_i += item_original == item_second
# divide the number of agreements by the number of responses to obtain a percentage
percent_agreement_i = percent_agreement_i / len(coder_1)
# calculate Brennan-Prediger's Kappa
bp_kappa_i = (percent_agreement_i - 1 / number_of_codes) / (1 - 1 / number_of_codes)
# calculate Cohen's Kappa
cohen_score_i = cohen_kappa_score(coder_1, coder_2)
# print the two Kappas
print(f"For code {i}, Cohen's Kappa is {cohen_score_i}")
print(f"For code {i}, Brennan-Prediger Kappa is {bp_kappa_i}")
# if both Kappas are above the threshold
if cohen_score_i >= 0.4 and bp_kappa_i >= 0.4:
# add the code to the list of codes to keep
codes_to_keep.append(i)
# add all the zeroes and ones for this code of the first coder to the list of all zeroes and ones for the first coder
for item in coder_1:
coder_1_all.append(item)
# do the same for the second coder
for item in coder_2:
coder_2_all.append(item)
# now compute the two Kappas for all the responses that both coders coded
# initialise the number of agreements to 0
percent_agreement = 0
# count how many codes the two coders agree on
for (item_original, item_second) in zip(coder_1_all, coder_2_all):
percent_agreement += item_original == item_second
# divide the number of agreements by the number of responses to obtain a percentage
percent_agreement = percent_agreement/len(coder_1_all)
# calculate Brennan-Prediger's Kappa
bp_kappa = (percent_agreement - 1 / number_of_codes) / (1 - 1 / number_of_codes)
# calculate Cohen's Kappa
cohen_score = cohen_kappa_score(coder_1_all, coder_2_all)
# print the two Kappas
print(f"For all responses, Cohen's Kappa is {cohen_score}")
print(f"For all responses, Brennan-Prediger Kappa is {bp_kappa}")
# loop over the dict with the number of times each code appears
for k,v in code_count.items():
# print how many times the code appears
print(f"Code {k} appears {v} times")
# if the code appeared in less than 10% of the responses
if v < 0.1 * len(df) and k in codes_to_keep:
# then remove it from the list of codes to keep
codes_to_keep.remove(k)
# print what codes are kept in the final thematic analysis
print(f"Codes in the final thematic analysis: {codes_to_keep}")
The thematic analysis code can be run either for motivating or for demotivating factors. You need to pass the name of the file and the number of codes for the corresponding file (10 for the motivating factors and 9 for the demotivating factors).
thematic_analysis("../../../data/thematic_analysis_motivating.csv", 10)
For code 1, Cohen's Kappa is 0.7046843177189409 For code 1, Brennan-Prediger Kappa is 0.8722860791826309 For code 2, Cohen's Kappa is 0.5413005272407733 For code 2, Brennan-Prediger Kappa is 0.9233716475095786 For code 3, Cohen's Kappa is 0.9149560117302052 For code 3, Brennan-Prediger Kappa is 0.9744572158365262 For code 4, Cohen's Kappa is 1.0 For code 4, Brennan-Prediger Kappa is 1.0 For code 5, Cohen's Kappa is 1.0 For code 5, Brennan-Prediger Kappa is 1.0 For code 6, Cohen's Kappa is 0.903010033444816 For code 6, Brennan-Prediger Kappa is 0.9872286079182631 For code 7, Cohen's Kappa is 1.0 For code 7, Brennan-Prediger Kappa is 1.0 For code 8, Cohen's Kappa is 0.9715964740450539 For code 8, Brennan-Prediger Kappa is 0.9872286079182631 For code 9, Cohen's Kappa is 0.8512820512820513 For code 9, Brennan-Prediger Kappa is 0.9872286079182631 For code 10, Cohen's Kappa is 0.7883211678832117 For code 10, Brennan-Prediger Kappa is 0.9744572158365262 For all responses, Cohen's Kappa is 0.8728312678741659 For all responses, Brennan-Prediger Kappa is 0.9706257982120051 Code 1 appears 31 times Code 2 appears 13 times Code 3 appears 18 times Code 4 appears 10 times Code 5 appears 7 times Code 6 appears 9 times Code 7 appears 9 times Code 8 appears 27 times Code 9 appears 5 times Code 10 appears 5 times Codes in the final thematic analysis: [1, 2, 3, 4, 6, 7, 8]
thematic_analysis("../../../data/thematic_analysis_demotivating.csv", 9)
For code 1, Cohen's Kappa is 1.0 For code 1, Brennan-Prediger Kappa is 1.0 For code 2, Cohen's Kappa is 0.794392523364486 For code 2, Brennan-Prediger Kappa is 0.9872159090909092 For code 3, Cohen's Kappa is 1.0 For code 3, Brennan-Prediger Kappa is 1.0 For code 4, Cohen's Kappa is 0.9735576923076923 For code 4, Brennan-Prediger Kappa is 0.9872159090909092 For code 5, Cohen's Kappa is 0.7884615384615384 For code 5, Brennan-Prediger Kappa is 0.9744318181818182 For code 6, Cohen's Kappa is 1.0 For code 6, Brennan-Prediger Kappa is 1.0 For code 7, Cohen's Kappa is 0.6430020283975659 For code 7, Brennan-Prediger Kappa is 0.9488636363636364 For code 8, Cohen's Kappa is 1.0 For code 8, Brennan-Prediger Kappa is 1.0 For code 9, Cohen's Kappa is 1.0 For code 9, Brennan-Prediger Kappa is 1.0 For all responses, Cohen's Kappa is 0.9478260869565217 For all responses, Brennan-Prediger Kappa is 0.9886363636363638 Code 1 appears 39 times Code 2 appears 6 times Code 3 appears 4 times Code 4 appears 31 times Code 5 appears 7 times Code 6 appears 6 times Code 7 appears 8 times Code 8 appears 5 times Code 9 appears 4 times Codes in the final thematic analysis: [1, 4]