Extracting Valuable Messages from ‘Alchemy of Souls’

▉ 30 Jun 2023      ▉ Duygu Dağdelen    

Alchemy of Souls was my guilty pleasure this past months. The South Korean TV series is set in a fictional country where the fates of two people become intertwined due to the forbidden ‘alchemy of souls’, which allows souls to switch bodies, while also introducing various political conflicts alongside. As a data scientist, I have a habit of seeing everything as a potential dataset that can be expressed in numbers and strings.

In this post, we will explore how fictional stories can help conceptualize the world, and I will share how I build a data-based application by each step: creating the data, applying NLP techniques to find the valuable lines, and finally building the application with my dataset. You can find the live website here.

How fictional stories can help conceptualize the world

We all enjoy a good story. Throughout history, telling stories has been one of our most fundamental communication methods. Interestingly, even in the realm of Data Science, we recognize and appreciate the influence of data storytelling. Stories go beyond mere entertainment; they possess a remarkable ability to teach us important life lessons, help us make sense of the world around us, and inspire us to think and act in new ways. Undoubtedly, they are a timeless and universal tool, capable of engaging, entertaining, and inspiring us all.

Korean series are well known for their phenomenal storytelling. They often have complex and well-developed plots with compelling characters, were various themes and issues are explored. Whether it’s the charming and charismatic leading man or the strong and independent leading woman, they have an interesting way of incorporating what a character is thinking into the plot. These characters are often written in a way that makes them feel like real people with flaws, hopes, and dreams. I have personally watched many K-dramas with relatable themes, that tackle pretty important issues. By depicting these themes in a way that is both honest and emotional, Korean dramas are able to connect with viewers on a deep level. Also worth mentioning are the visual elements, their high production values are no joke. It’s hard to look away from their high-quality cinematography, beautiful locations, and well-choreographed scenes.

Creating the data

To create the data, I turned to the English subtitles per episode, which where available in srt format. Using Python, I wrote a script to extract the dialogue from the subtitle files per episode and convert them into a csv file. This process involved identifying and selecting the relevant data, with the result of these output csv files containing all the spoken lines by the characters throughout the episode, along with timestamps, the season and episode number in which they were spoken. With this collection of prepped datasets in hand, I was ready to analyze the dialogue and find the meaningful, iconic lines that contain valuable messages using Natural Language Processing (NLP) techniques in Python.

Applying NLP techniques to find the valuable lines

Using a topic modeling technique like Latent Dirichlet Allocation (LDA) to extract themes and messages from the preprocessed text data:

from gensim import corpora
from gensim.models import LdaModel

# create a dictionary
dictionary = corpora.Dictionary(df_ep19['clean_text'].apply(lambda x: x.split()))

# create a corpus
corpus = [dictionary.doc2bow(text.split()) for text in df_ep19['clean_text']]

# train the LDA model
lda_model = LdaModel(corpus=corpus, id2word=dictionary, num_topics=5, random_state=42)

Viewing the top 5 topics and the top words in each topic:

# print the topics and the top words in each topic
for idx, topic in lda_model.print_topics(-1):
    print(f"Topic: {idx} \nWords: {topic}\n")

Building a application with my dataset

If you’re curious to see the application, you can find it here. I hope you enjoy it as much as I enjoyed creating it, and that it inspires you to explore the meaningful messages that can be found in your favorite stories.