Thematic analysis¶

Author: Andrei Stefan
Date: 13-11-2023
Required files: data/thematic_analysis_demotivating.csv, data/thematic_analysis_motivating.csv
Output files: no output files

This file contains the code to reproduce the results for the thematic analysis. The corresponding figures and tables are: Table I.2, Table I.3, Figure I.1.

In [1]:
# import all required packages
import pandas as pd

from sklearn.metrics import cohen_kappa_score

Thematic analysis, Table H.2, Table H.3, and Figure H.1¶

In the thematic analysis, we looked at open-ended answers to questions about what people found motivating and demotivating about talking to the virtual coach. Based on these answers, 10 codes are created for the motivating aspects, and 9 codes are created for the demotivating aspects. A second coder was asked to additionally code the open-ended answers, after being trained on a few examples. The function below computes agreement between the two coders through Cohen's Kappa and Brennan-Prediger's Kappa. Additionally, it counts how many times the original coder assigned a code in the message to help determine which codes should be the focus of the analysis and which are more scarce, and thus secondary.

In [2]:
def disagreement():
    """
    Function to calculate how many disagreements there were between the two coders for the training data.
    
    Args: none.

    Returns: none.
    """
    
    # read the "training examples" sheet of the excel file for the motivating examples as a dataframe
    df_motivating = pd.read_excel("../../../data/thematic_analysis_motivating_examples_for_second_coder.xlsx", "training examples")
    
    # get the original codes as a list
    original_motivating = df_motivating['Original code numbers'].tolist()
    
    # get the second coder's codes as a list
    second_coder_motivating = df_motivating['Code numbers'].tolist()
    
    # initialise number of agreed examples to 0
    agreements = 0
    
    # loop over the original and second coder's codes
    for (original, second_coder) in zip (original_motivating, second_coder_motivating):
        # if the codes are the same
        if original == second_coder:
            # count an agreement
            agreements +=1
    
    # print the result
    print(f"For the motivating factors, the two coders disagreed on {len(original_motivating) - agreements} responses out of {len(original_motivating)} responses total.")
    
    # read the "training examples" sheet of the excel file for the demotivating examples as a dataframe
    df_demotivating = pd.read_excel("../../../data/thematic_analysis_demotivating_examples_for_second_coder.xlsx", "training examples")
    
    # get the original codes as a list
    original_demotivating = df_demotivating['Original code numbers'].tolist()
    
    # get the second coder's codes as a list
    second_coder_demotivating = df_demotivating['Code numbers'].tolist()
    
    # initialise number of agreed examples to 0
    agreements = 0
    
    # loop over the original and second coder's codes
    for (original, second_coder) in zip (original_demotivating, second_coder_demotivating):
        # if the codes are the same
        if original == second_coder:
            # count an agreement
            agreements +=1
    
    # print the result
    print(f"For the demotivating factors, the two coders disagreed on {len(original_motivating) - agreements} responses out of {len(original_motivating)} responses total.")
    
disagreement()
For the motivating factors, the two coders disagreed on 2 responses out of 12 responses total.
For the demotivating factors, the two coders disagreed on 0 responses out of 12 responses total.
In [3]:
def thematic_analysis(filename, number_of_codes):
    """
    Function to make the computations necessary for the thematic analysis.
    
    Args: filename - the name of the file containing the data,
          number_of_codes - the number of codes that were used when coding the data.

    Returns: none.
    """
    
    # read the file into a dataframe
    df = pd.read_csv(filename)
    
    # get the original codes as a list
    original = df['Original code numbers'].tolist()
    # get the second conder's codes as a list
    second_coder = df['Code numbers'].tolist()
    
    # initialise an empty dict that will count how many times each code appears
    code_count = {}
    
    # start the count for each possible code at 0
    for i in range(1, number_of_codes+1):
        code_count[i] = 0
    
    # read the file needed for counting the codes into a dataframe
    df_count = pd.read_csv(f"{filename[:-4]}_full.csv")
    
    # get the original coding of all the responses (the list called original only has the responses which both coders coded,
    # while this one also contains the responses used for training and testing the second coder)
    codes_to_count = df_count['Original code numbers'].tolist()
    
    # initialise an empty list which will hold the codes that should be kept in the final analysis
    codes_to_keep = []
    
    # initialise 2 empty lists which will hold all the codes that the two coders assigned,
    # but as zeroes and ones, indicating if a specific code applies for the respective message or not
    coder_1_all = []
    coder_2_all = []
    
    # loop over all possible codes
    for i in range(1, number_of_codes+1):
        
        # loop over the list with all responses
        for coding in codes_to_count:
            # if there are multiple codes for a response
            if "," in coding:
                # then split them up
                codes = coding.split(", ")
            # otherwise, there is only one code so keep the original
            else:
                codes = coding
            
            # if the current code we are checking is in the list of assigned codes
            if str(i) in codes:
                # then increment the count of the current code once
                code_count[i] += 1
        
        # initialise two empty lists which will hold zeroes and ones that indicate
        # whether or not the current code we are checking is present in each response, according to each coder
        coder_1 = []
        coder_2 = []
        
        # loop over the two coders' codings
        for coding_original, coding_second in zip(original, second_coder):
            # if there are multiple codes
            if "," in coding_original:
                # then split them
                codes_original = coding_original.split(", ")
            # otherwise, there is only one code so keep the original
            else:
                codes_original = coding_original
            
            # if the current code we are checking is in the list of assigned codes
            if str(i) in codes_original:
                # then increment append a 1 to the list
                coder_1.append(1)
            # otherwise, append a 0 to the list
            else:
                coder_1.append(0)
                
            # if there are multiple codes
            if "," in coding_second:
                codes_second = coding_second.split(", ")
            # otherwise, there is only one code so keep the original
            else:
                codes_second = coding_second
            
            # if the current code we are checking is in the list of assigned codes
            if str(i) in codes_second:
                # then increment append a 1 to the list
                coder_2.append(1)
            # otherwise, append a 0 to the list
            else:
                coder_2.append(0)
        
        # initialise the number of agreements to 0
        percent_agreement_i = 0
        
        # count how many codes the two coders agree on
        for (item_original, item_second) in zip(coder_1, coder_2):
            percent_agreement_i += item_original == item_second
        
        # divide the number of agreements by the number of responses to obtain a percentage
        percent_agreement_i = percent_agreement_i / len(coder_1)
        
        # calculate Brennan-Prediger's Kappa
        bp_kappa_i = (percent_agreement_i - 1 / number_of_codes) / (1 - 1 / number_of_codes)
        
        # calculate Cohen's Kappa
        cohen_score_i = cohen_kappa_score(coder_1, coder_2)
        
        # print the two Kappas
        print(f"For code {i}, Cohen's Kappa is {cohen_score_i}")
        print(f"For code {i}, Brennan-Prediger Kappa is {bp_kappa_i}")
        
        # if both Kappas are above the threshold
        if cohen_score_i >= 0.4 and bp_kappa_i >= 0.4:
            
            # add the code to the list of codes to keep
            codes_to_keep.append(i)
        
        # add all the zeroes and ones for this code of the first coder to the list of all zeroes and ones for the first coder
        for item in coder_1:
            coder_1_all.append(item)
        
        # do the same for the second coder
        for item in coder_2:
            coder_2_all.append(item)
    
    # now compute the two Kappas for all the responses that both coders coded
    
    # initialise the number of agreements to 0
    percent_agreement = 0
    
    # count how many codes the two coders agree on
    for (item_original, item_second) in zip(coder_1_all, coder_2_all):
        percent_agreement += item_original == item_second
    
    # divide the number of agreements by the number of responses to obtain a percentage
    percent_agreement = percent_agreement/len(coder_1_all)
    
    # calculate Brennan-Prediger's Kappa
    bp_kappa = (percent_agreement - 1 / number_of_codes) / (1 - 1 / number_of_codes)
    
    # calculate Cohen's Kappa
    cohen_score = cohen_kappa_score(coder_1_all, coder_2_all)
    
    # print the two Kappas
    print(f"For all responses, Cohen's Kappa is {cohen_score}")
    print(f"For all responses, Brennan-Prediger Kappa is {bp_kappa}")
    
    # loop over the dict with the number of times each code appears
    for k,v in code_count.items():
        # print how many times the code appears
        print(f"Code {k} appears {v} times")
        # if the code appeared in less than 10% of the responses
        if v < 0.1 * len(df) and k in codes_to_keep:
            # then remove it from the list of codes to keep
            codes_to_keep.remove(k)
    
    # print what codes are kept in the final thematic analysis
    print(f"Codes in the final thematic analysis: {codes_to_keep}")

The thematic analysis code can be run either for motivating or for demotivating factors. You need to pass the name of the file and the number of codes for the corresponding file (10 for the motivating factors and 9 for the demotivating factors).

In [4]:
thematic_analysis("../../../data/thematic_analysis_motivating.csv", 10)
For code 1, Cohen's Kappa is 0.7046843177189409
For code 1, Brennan-Prediger Kappa is 0.8722860791826309
For code 2, Cohen's Kappa is 0.5413005272407733
For code 2, Brennan-Prediger Kappa is 0.9233716475095786
For code 3, Cohen's Kappa is 0.9149560117302052
For code 3, Brennan-Prediger Kappa is 0.9744572158365262
For code 4, Cohen's Kappa is 1.0
For code 4, Brennan-Prediger Kappa is 1.0
For code 5, Cohen's Kappa is 1.0
For code 5, Brennan-Prediger Kappa is 1.0
For code 6, Cohen's Kappa is 0.903010033444816
For code 6, Brennan-Prediger Kappa is 0.9872286079182631
For code 7, Cohen's Kappa is 1.0
For code 7, Brennan-Prediger Kappa is 1.0
For code 8, Cohen's Kappa is 0.9715964740450539
For code 8, Brennan-Prediger Kappa is 0.9872286079182631
For code 9, Cohen's Kappa is 0.8512820512820513
For code 9, Brennan-Prediger Kappa is 0.9872286079182631
For code 10, Cohen's Kappa is 0.7883211678832117
For code 10, Brennan-Prediger Kappa is 0.9744572158365262
For all responses, Cohen's Kappa is 0.8728312678741659
For all responses, Brennan-Prediger Kappa is 0.9706257982120051
Code 1 appears 31 times
Code 2 appears 13 times
Code 3 appears 18 times
Code 4 appears 10 times
Code 5 appears 7 times
Code 6 appears 9 times
Code 7 appears 9 times
Code 8 appears 27 times
Code 9 appears 5 times
Code 10 appears 5 times
Codes in the final thematic analysis: [1, 2, 3, 4, 6, 7, 8]
In [5]:
thematic_analysis("../../../data/thematic_analysis_demotivating.csv", 9)
For code 1, Cohen's Kappa is 1.0
For code 1, Brennan-Prediger Kappa is 1.0
For code 2, Cohen's Kappa is 0.794392523364486
For code 2, Brennan-Prediger Kappa is 0.9872159090909092
For code 3, Cohen's Kappa is 1.0
For code 3, Brennan-Prediger Kappa is 1.0
For code 4, Cohen's Kappa is 0.9735576923076923
For code 4, Brennan-Prediger Kappa is 0.9872159090909092
For code 5, Cohen's Kappa is 0.7884615384615384
For code 5, Brennan-Prediger Kappa is 0.9744318181818182
For code 6, Cohen's Kappa is 1.0
For code 6, Brennan-Prediger Kappa is 1.0
For code 7, Cohen's Kappa is 0.6430020283975659
For code 7, Brennan-Prediger Kappa is 0.9488636363636364
For code 8, Cohen's Kappa is 1.0
For code 8, Brennan-Prediger Kappa is 1.0
For code 9, Cohen's Kappa is 1.0
For code 9, Brennan-Prediger Kappa is 1.0
For all responses, Cohen's Kappa is 0.9478260869565217
For all responses, Brennan-Prediger Kappa is 0.9886363636363638
Code 1 appears 39 times
Code 2 appears 6 times
Code 3 appears 4 times
Code 4 appears 31 times
Code 5 appears 7 times
Code 6 appears 6 times
Code 7 appears 8 times
Code 8 appears 5 times
Code 9 appears 4 times
Codes in the final thematic analysis: [1, 4]