Name Matric Number
ALICIA TEO YAN LING A0207953Y
BRANDON ONG WEI-ZHI A0183065E
GRACE YIO JIA YI A0188904R
IZZ BIN LOKMAN A0200187N
YE JIADONG A0199832M

1. Case Description

1.1 Background Information

Cryptocurrencies have experienced a significant boom in recent times, with a report by Allied Market Research estimating the market to more than triple by 2030 (Crawley, 2021). In order to visualize the extent of price changes within the cryptocurrency market, we plot a graph depicting prices of the world’s oldest cryptocurrency - Bitcoin, and compare it with the stock price changes of an established company - Amazon. Amazon is selected for comparison as it shares a relatively similar market capitalisation as Bitcoin currently (~ 1.76 trillion vs 1.17 trillion USD respectively), and its stock prices were similar to Bitcoin’s prices in early 2017 (~900 USD).

# Pull price data from Yahoo Finance
amazon <- as.data.frame(getSymbols.yahoo("amzn", auto.assign=FALSE, periodicity="monthly")) %>% 
                                                  tibble::rownames_to_column("Date") %>% 
                                                  mutate(Date = as.Date(Date), ticker = "Amazon") %>% filter(Date >= as.Date("2017-01-01")) %>%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            select(ticker, Date, 'AMZN.Close') %>% 
                                                  rename(value = 'AMZN.Close') %>% 
                                                  mutate(value = log10(value))

bitcoin <- as.data.frame(getSymbols.yahoo("btc-usd", auto.assign=FALSE, periodicity="monthly")) %>% tibble::rownames_to_column("Date") %>% 
                                                                                                    mutate(Date = as.Date(Date), ticker = "Bitcoin") %>% 
                                                                                                    filter(Date >= as.Date("2017-01-01")) %>% 
                                                                                                    select(ticker, Date, 'BTC-USD.Close') %>% 
                                                                                                    rename(value = 'BTC-USD.Close') %>% 
                                                                                                    mutate(value = log10(value))

bind <- rbind(amazon, bitcoin) %>% mutate(ticker = factor(ticker))

bind %>% ggplot(aes(x=Date, y=value, group=ticker, color=ticker)) +
         geom_line() +
          ggplot2::annotate(
          "text", x = as.Date('2021-07-01'), y = 5, 
          label = "Bitcoin",
          family = "Lato",
          fontface = "bold",
          size = 6,
          color = "grey40",
        ) + 
         ggplot2::annotate(
          "text", x = as.Date('2020-12-30'), y = 3.8, 
          label = "Amazon",
          family = "Lato",
          fontface = "bold",
          size = 6,
          color = "grey40",
          hjust = 0
        ) +
        labs(title = "Bitcoin (USD) vs Amazon Inc.",
             caption = "Both Bitcoin and Amazon started out at the same price of ~900 USD in early 2017.",
             y = "Price  (Log Scale)", x="") + theme(axis.title.y = element_text(margin = margin(r = 15)))
Figure 1 (Source: Yahoo Finance)

Figure 1 (Source: Yahoo Finance)

From the graph, we notice that while both Amazon and Bitcoin started out at the same price of 900 USD in early 2017, Bitcoin’s price valuation quickly surpassed that of Amazon, and more importantly, has always remained above Amazon’s stock prices from 2017 till now. Presently, while both share a similar market capitalisation, Bitcoin trades at almost 22 times that of Amazon, clearly demonstrating the significant price momentum that cryptocurrency products possess. While this renewed interest is partly driven by institutional investors, the ease of access to cryptocurrency purchase through the myriad of trading platforms (i.e. Binance), coupled with the potential of lucrative returns, has spurred retail and personal investors to readily embrace cryptocurrency. In fact, a survey by the University of Chicago revealed that 13% of Americans had interacted with cryptocurrency in a 12 month period, and these involved younger and more diverse profiles (Iacurci, 2021). In Singapore, a recent poll of 1000 respondents by Independent Reserve revealed that close to 43% of respondents said that they owned cryptocurrency, further lending support to the growing adoption of cryptocurrency investment products across the world (Bourgi, 2021).

Going further, we visualise the demographical age groups of the countries with the highest cryptocurrency adoption. These countries have been identified as India, the United States, Nigeria and Vietnam (TripleA, 2021). We then pull demographic data supplied by the United Nations, that shows the most recent projections of age groups of individuals within these four countries. A plot was then graphed below.

countries <- list("India", "United States of America", "Nigeria", "Viet Nam")
population <- read.csv("un_pop_2019.csv")  %>% clean_names()  %>% filter(region_subregion_country_or_area %in% countries) %>% filter(reference_date_as_of_1_july == 2020) %>% select(c(3, 12, 43,58)) %>% mutate(x0_14 = as.numeric(x0_14), x15_49 = as.numeric(x15_49), x50 = as.numeric(x50))
population <- population %>% rename('0-14' = x0_14, '15-49' = x15_49, '50+' = x50) %>% gather(age_group, percentage_of_population, 2:4) %>% mutate(age_group = factor(age_group))
population %>% ggplot(aes(x=region_subregion_country_or_area, y=percentage_of_population, fill=age_group)) + 
               geom_bar(position = "dodge", stat = "identity") +
               coord_flip() +
               labs(subtitle = "Age Groups of Individuals in Countries with the highest Cryptocurrency Adoption",
                    x = "Countries", y="Percentage of Population") +
               theme(axis.title.y = element_text(margin = margin(r = 25)), axis.title.x = element_text(margin = margin(t = 20))) +
               theme(legend.position = "right", plot.subtitle=element_text(size=13))
Figure 2 (Source: United Nations and TripleA)

Figure 2 (Source: United Nations and TripleA)

From the bar graph above, it is clear that the age group that forms the highest percentage of the total population within all four countries is the 15-49 range. Being younger, it is likely that a greater proportion of the individuals driving cryptocurrency adoption within their countries are younger rookie traders, rather than established and experienced amateurs. This younger demographic group is also more savvy in using social media tools to communicate.

While cryptocurrencies have the potential to generate significant returns, they are also highly volatile assets, and they can expose investors to sizable amounts of risk. Deemed as “purely speculative assets” by experts (Csreinicke, 2021), its unpredictable fluctuations have wiped out huge amounts of wealth during sudden downturns. For example, in March this year, Bitcoin lost almost 50% of its value in a two-day period, with other digital currencies following suit. On the other hand, it also has the potential to create tremendous returns, much like how the relatively unknown Shiba Inu coin rose almost 216% in a single week in October.

doge <- as.data.frame(getSymbols.yahoo("doge-usd", auto.assign=FALSE, periodicity="daily")) %>% 
                                                  tibble::rownames_to_column("Date") %>% 
                                                  mutate(Date = as.Date(Date)) %>% filter(Date >= as.Date("2021-06-01")) %>%
                                                  select(Date, 'DOGE-USD.Close') %>% 
                                                  rename(value = 'DOGE-USD.Close') 
doge %>% ggplot(aes(x=Date, y=value, color="blue")) +
         geom_line() +
         labs(title = "Dogecoin (USD) Prices from mid 2021",
             y = "Price (USD)", x="") + theme(axis.title.y = element_text(margin = margin(r = 15))) 
Figure 3 (Source: Yahoo Finance)

Figure 3 (Source: Yahoo Finance)

The chart of Dogecoin’s price from the middle of 2021 exemplifies how volatile cryptocurrencies can be - wide swings down and up in a very short span of 6 months.

reddit_sentiment_history <- read.csv('mean_sentiment_scores_reddit.csv', header=T) %>% mutate(date = as.Date(date, format="%d/%m/%y"))

coin_history_df <- coin_history(coin_id = "ethereum", vs_currency = "usd", days = 30)
coin_history_df$date<- substr(coin_history_df$timestamp,1,10)
coin_history_average <- coin_history_df%>% group_by(date) %>% summarise(average_price = mean(price)) %>% arrange(date) %>% mutate(date = as.Date(date))

merged_sentiment <- merge(reddit_sentiment_history, coin_history_average, by = "date") %>% tail(10)
merged_sentiment <- merged_sentiment %>% gather("type", "value", 2:3) 

gplottime <- merged_sentiment %>% ggplot(aes(x=date, y=value)) +
  geom_line(size=2, alpha=0.9, aes(color=type), group = 1) +
  geom_point(color = "#2c3e50") +
  theme(text = element_text(size=13)) +
  labs(
    x = NULL, y = NULL,
    title = "Avg Price of Ethereum (USD) vs Mean Sentiment Score of Reddit Comments",
    subtitle = paste("Comments collected from", (Sys.Date() - 10) , "to" , Sys.Date() - 1)
  ) + facet_wrap(~factor(type), scales = "free_y", nrow=2, strip.position = "left", 
                labeller = as_labeller(c(average_price  = "Average Price of ETH (USD)", mean_sentiment = "ETH Mean Sentiment Scores (Reddit)") ) ) +
     ylab(NULL) +
     theme(strip.background = element_blank(),
           strip.placement = "outside",
           text = element_text(size=13))

gplottime
Figure 4 (Source: Reddit and Coingecko)

Figure 4 (Source: Reddit and Coingecko)

Lastly, social media activity appears to be relatively influential in the rise and fall of cryptocurrency markets, with Tesla’s CEO Elon Musk’s negative Twitter comments being attributed to the wipe-out of almost USD 300 billion of the entire cryptocurrency market. (Kharpal & Browne, 2021) In order to illustrate the influence of social media sentiments on the cryptocurrency market, the following graph shows both the average prices of Ethereum, and the mean sentiment scores of over 5000 Reddit comments in the Ethereum subreddit in a 10 day period. The sentiment scores were generated using our sentiment analysis toolkits outlined in the subsequent sections. We can see a distinct relationship between both plots. After the mean sentiment scores began rebounding on Nov 04, we noticed that average prices for Ethereum also began to rebound 2 days later - on Nov 06. When the mean sentiments began to fall on Nov 07, average Ethereum prices followed suit 2 days later, on Nov 09. This supports the theory that social media sentiments influence cryptocurrency price movements, rather than the converse, and also illustrates the effect of “lag time”, where sentiments typically take 1-2 days to influence the prices of the cryptocurrency market.

1.2 Current Problem

With speculation being a key driver of cryptocurrency markets, investors will require a means to analyze the emotional reactions of the market. Social media, where millions of users throng daily, is a perfect candidate to harness this data. This is especially so if the larger proportion of the population in the countries with the highest cryptocurrency adoption is also younger, and hence, more social media savvy. However, there does not exist an easy to use and readily accessible platform scaled down for retail and personal investors that represent the younger generation of cryptoccurency investors to be able to more accurately visualise social media activity on cryptocurrency related topics. Furthermore, it is prohibitively difficult for an end-user to be able to immediately gauge the general sentiment of particular cryptocurrencies due to the huge quantity of tweets that are sent on a daily basis. Cryptocurrency tweets on such sites are also not organised in a concise manner, which makes it difficult for end-users to visualise sentiments besides scrolling through endless pages of potentially unrelated tweets. Data from BitInfoCharts shows that at the height of the Bitcoin craze in May 2021, there were approximately 100-200 thousand tweets on Bitcoin alone. It is challenging for time-strapped investors to be able to sieve through these and immediately grasp a common sentiment. If analyzing one platform is already so challenging, then being able to replicate this on multiple discussion platforms (i.e. Twitter, Reddit) will involve double the workload.

As prominent investor André Kostolany pointed out, facts are only responsible for 10% of the overall market activity, while the rest is likely attributed to psychology. (Nann, 2019) Without an accessible means to gauge the overall sentiments of users on major social media platforms, retail and personal investors would find it extremely challenging to be able to arrive at an informed decision on their cryptocurrency purchases, leading to an increased risk of incurring potentially huge losses.

1.3 Available Solutions

There are several available solutions in the market to assist potential cryptocurrency investors in coming to a more informed decision. They are outlined in the following table.

Available Tool Pain Points for Consumers
Sole Technical Analysis Tools (e.g. FX Street) These do not consider speculative behaviour from consumers. These are also largely personal individual insights, but do not take into consideration sentiments from the wider market.
Market Sentiment Indexes (e.g. Bitcoin Fear and Greed Index) These are offered by trading exchanges and involves a level of technical complexity that layman retail investors might not easily grasp
Cryptocurrency Trading Apps (e.g. Binance) Apps provide a ranking of cryptocurrencies based on the volume of trade (i.e which cryptocurrency is hot in the market) at real time. However, trading volume does not necessarily correlate to overall market sentiment and this is a very rough gauge.
Google Trends While this allows users to look at the popularity of search terms, it does not shed light on social media discussions or the direction of the sentiments (positive / negative)
Whale Monitoring (e.g. Whale Alert) These tools help to monitor the actual trades performed by big players in the cryptomarket space, but do not give an indication of the overall market sentiment
Comprehensive sentiment analysis tools While these advanced tools have the ability to analyze social media sentiments, they are also expensive and more suitable for institutional investors in cryptocurrencies


Even with the available solutions, there exist multiple pain-points for consumers. For one, several of the abovementioned solutions do not deliver real-time visualisation. For example, technical analysis is usually carried out at most once every 24-48 hours, and this might not be as relevant anymore considering the volatility of cryptocurrencies. On another hand, solutions that incorporate monitoring of specific individuals are not able to take into consideration the sentiments of the wider market.

1.4 Our Proposal

Blockchain company TripleA estimates that there are close to 300 million cryptocurrency users around the world. (Lisa, 2021) With the growing and widespread adoption of cryptocurrency, it is likely that the demand for tools that provide insights to make more informed cryptocurrency investment decisions will increase significantly. Since this growing interest is largely centered on the younger generation who are more familiar with social media platforms, we propose Decrypt, which is a real-time visualisation app built on R that can actively crawl and analyze the sentiments of two major social media platforms - Twitter and Reddit.

Users will be able to select from the top 20 cryptocurrencies, and view visualisations based on the analysis of related tweets. Trending words for the respective cryptocurrencies will be displayed to give users greater insight into the factors driving market sentiments. In order to ensure relevance, the top 20 cryptocurrencies will be extracted at real-time (i.e. based on live market capitalisation data pulled from Coingecko’s API). An example of the current top 10 cryptocurrencies based on market capitalisation is outlined in the table below.

Top 10 Cryptocurrencies
Bitcoin (BTC) Ethereum (ETH)
Binance Coin (BNC) Cardano (ADA)
Ripple (XRP) Dogecoin (DOGE)
Polkadot (DOT) Solana (SOL)
Litecoin (LTC) Stellar (XRM)

In order to offer a holistic service, our platform will also provide real-time technical analysis tools that combine historical cryptocurrency price trends with current pricing data for users to be able to analyze cryptocurrencies with established financial fundamentals and also conduct price forecasts and portfolio optimizations. This adds value to real-time social media sentiment analyses, as technical analysis provides an additional dimension to the actual buying and selling conditions of cryptocurrencies. For example, sentiment analyses alone cannot reveal whether a particular cryptocurrency is overbought or oversold, whereas technical analyses can give us insight into the formation of price bubbles, which might signal a potential downward crash.

In short, our objectives are the following:

  • [Ask Why, View Diversely] Provide technical analysis, and sentiment analysis capabilities to our users
  • [Attend] Present only relevant, and most important data
  • [Simplify] Provide a simple to use and appealing user-interface for users to easily navigate our service
  • [Respond] Allow users to share data we captured

2. Business Model

2.1 Our Target Audience

Taking into consideration our demographic analysis above, we aim to target the younger and more diverse group of retail/personal investors, who also likely make up the majority of cryptocurrency investors. Data from Hootsuite also suggests that more than 80% of Twitter users are below the age of 50 (Sehl, 2020). This group of individuals are likely not as resourceful as institutional or career investors, but they can similarly appreciate and comprehend value from cryptocurrency assets. In fact, more than 50% of young investors in the UK are actively investing in cryptocurrencies (Kiderlin, 2021). This shows that there is great potential in recommending a product that allows these youngsters, who are active on social media themselves, the opportunity to gauge market sentiments from the social media interactions of their peers.

2.2 Path to Profitability

We expect that our user base will increase in tandem with the rise in interest and demand for cryptocurrency related services. With cryptocurrency growth rates estimated to be around 30% year on year (Facts & Factors, 2021), this gives us the confidence that we will possess a sizable user base to offer in-app advertisements related to cryptocurrency products in order to generate revenue from advertisement sales. Furthermore, beyond an initial free-month of a trial subscription, we intend to charge users a nominal subscription fee, so that they can continue relying on our services to make more informed cryptocurrency investment decisions.

3. Methodology

3.1 Overview

All of our data sources are real-time in nature. This ensures data authenticity and relevance as using historical csv data will invalidate much of the graphs and sentiment scores that we present to users. Cryptocurrency moves at a lightning pace, and it is important to present updated data to our users. We consolidate real-time and historical market information of open, low, high, close prices of cryptocurrencies from both Yahoo Finance using functions from the quantmod package and Coingecko’s API. Coingecko is also used to identify the top 20 cryptocurrencies for users to select on a real-time basis. Tweet and Reddit comment data will be obtained mainly from readily available APIs from both Twitter and Reddit, and they will be cleaned and re-structured in order to derive useful visualisations such as word clouds, and perceptual maps. Line graphs will incorporate historical price trends.

3.2 Data Sources

3.2.1 Historical and Current Market Data

Platform API Link R package used
Coingecko http://api.coingecko.com/api/v3 geckor
Yahoo Finance https://yfapi.net/ quantmod

Historical and current data of actual price and market trends for cryptocurrencies will be fetched using the real-time CoinGecko API as well as Yahoo Finance. Data collected here is instrumental for our technical analysis.

3.2.2 Social Media Interactions

Social media data is obtained from the respective API endpoints exposed by the social media platforms below. The data is then filtered, processed and visualised.

Platform API Link R package used
Twitter https://developer.twitter.com/en/docs/twitter-api rTweet
Reddit https://www.reddit.com/dev/api/ RedditExtractoR

Sentiment analysis will then be performed with the assistance of the tidytext and textdata packages which are available in R. These packages assist in evaluating the emotions that are tied to textual elements, in order for us to arrive at a sentiment scoring system for the searched cryptocurrency.

4. App Description and Methodology

Deployed Shiny App Site: https://decrypt.shinyapps.io/decrypt

The site will be live only for a few days, as it has a high memory requirement and hosting costs on shinyapps.io are expensive. If the site is not live, the Shiny App can be started by clicking the Run button in the app.R file. The API keys to access Twitter are still valid at the point of submission.

4.1 App Functionality

Our app provides a convenient platform for users to select their interested cryptocurrencies and view sentiment, as well as technical analysis data about them. Within sentiment analyses, users can select between either Twitter or Reddit to view respective data pertaining to each platform. Besides a word cloud visualisation and a plot of the top occurring words in Tweets / Reddit comments, users will also be able to view the sentiment trend over time, as well as by the tweet / Reddit comment ids.

Besides this, users can also conduct technical analyses of their selected cryptocurrencies by viewing important techincal metrics and trends that are plotted automatically, as well as view predictions of future cryptocurrency prices with the use of the Facebook Prophet package. The prediction is driven by the historical pricing data of the cryptocurrencies. Lastly, they can select up to 3 cryptocurrencies to be able to optimize their portfolio.

4.2 App Development

In order to fulfill our objectives outlined above, we designed our app user interface to compromise of 4 main tabs:

  • Twitter Sentiment Analysis
  • Reddit Sentiment Analysis
  • Technical Analysis
  • About Us

In order for our user interface to be appealing, yet simple, we decided to utilise the argonDash package to build our dashboard. As argonDash is built with Bootstrap, this will allow our user interface to scale according to screen dimensions, which makes our app scale automatically on mobile devices with a smaller screen resolution. Our sidebar menu allow users to navigate through the app with a few simple clicks. In addition, argonDash also provides us with the ability to create tabs within pages, which allows us to place graphs without cluttering a single page significantly.

We outline some of the libraries that are used to power our major functionalities, as well as our front-end.

Libraries for Major Functionalities:

  • prophet (for prediction)
  • quantmod (Yahoo Finance)
  • PerformanceAnalytics (technical analysis charts)
  • PortfolioAnalytics (technical analysis)
  • pso (portfolio optimization)
  • ggplot (graphing)
  • plotly (interactive graphs)
  • dygraphs (interactive graphs)
  • visNetwork (3d network visualisation)
  • shinyscreenshot (screenshot capture)
  • timevis (timeline visualisation)

Libraries for Front-end Shiny App:

  • shinyjs (enable javascript functions)
  • shinywidgets (inputs and options)
  • shinycssloaders (loading animation)
  • shinydisconnect (disconnect message)

In the following section, we proceed to discuss the specific implementation of our major functionalities together with the codes that are packaged together to build our eventual app.

5. Twitter Sentiment Analysis

5.1 Authenticating Twitter Access

In order to be able to access real-time tweets, we obtained developer access to Twitter’s API, which required specific API keys for authentication. This section initiates the authentication process, but the codes will not be shown due to the presence of API secrets. We note the following limitations to the free usage of Twitter’s API, and we will work within these means in our project. However, it is possible to upgrade to the bearer token for more relaxed limits during production.

Limitation Description
Overall Twitter API limits Limit of 18,000 tweets per fifteen minutes
Maximum Date Limits Historical tweets can be pulled up to a maximum of the past 7 days.

5.2 Obtaining the top-performing 20 cryptocurrency coins

We considered two approaches to allowing users to search for the cryptocurrency coin that they would wish to analyze. The two appraoches are outlined below:

  1. Allow a free-text field for users to key in any cryptocurrency coin for us to analyze.
  2. Provide a fixed set of the top 20 cryptocurrency coins based on market capitalization for users to choose from.

We then considered the advantages and disadvantages of both approaches, before we explain how we came to our final decision.

Advantages of Option 1

  • There is greater flexibility for users to analyze metrics of any cryptocurrency coin, even those that are less well-known.

Disadvantages of Option 1

  • As users can input any search term, they might end up searching for unrelated topics, and this dilutes the focus of our product on cryptocurrency products
  • Unrelated search terms also prevents us from carrying out proper analysis, due to inconsistent data.

Advantages of Option 2

  • This would allow us a more focused approach, by ensuring that users search within the parameters of cryptocurrencies only.
  • Data analysis and visualisation can be more targetted to cryptocurrency products, instead of having to deal with wide-ranging input possibilities.

Disadvantages of Option 2

  • There would be less flexibility to search for less well-known coins.

Having considered the pros and cons of both options, we find that in the majority of use cases, our users will find information pertaining to the top-performing cryptocurrencies more meaningful, as these are also the coins that they are more likely to consider an investment in. Less well-known coins are also not as widely listed on multiple cryptocurrency exchanges, and this further limits their popularity and potential interest. We hence decided to strike a balance between flexibility and ease of development/use, by limiting users to a reasonable list of the top 20 best-performing cryptocurrency coins based on market capitalization.

In order to pull this list of top-20 supported cryptocurrencies, we require a separate API to pull real-time information on the cryptocurrency market, in order to be able to filter out the necessary coins. We had two possibilities - Coingeckor and Coinmarketcap - and eventually decided to go with Coingecko as it possessed a more relaxed API usage allowance (50 calls per minute).

# We make use of Coingecko's public API endpoints in order to pull the top 20 cryptocurrencies by market capitalisation
url <- "https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd&order=market_cap_desc&per_page=20&page=1"
data <- fromJSON(url)
data <- as.data.frame(data)
head(data)
##            id symbol         name
## 1     bitcoin    btc      Bitcoin
## 2    ethereum    eth     Ethereum
## 3 binancecoin    bnb Binance Coin
## 4      tether   usdt       Tether
## 5      solana    sol       Solana
## 6     cardano    ada      Cardano
##                                                                                  image
## 1             https://assets.coingecko.com/coins/images/1/large/bitcoin.png?1547033579
## 2          https://assets.coingecko.com/coins/images/279/large/ethereum.png?1595348880
## 3 https://assets.coingecko.com/coins/images/825/large/binance-coin-logo.png?1547034615
## 4       https://assets.coingecko.com/coins/images/325/large/Tether-logo.png?1598003707
## 5           https://assets.coingecko.com/coins/images/4128/large/Solana.jpg?1635329178
## 6           https://assets.coingecko.com/coins/images/975/large/cardano.png?1547034860
##   current_price   market_cap market_cap_rank fully_diluted_valuation
## 1      64889.00 1.225187e+12               1            1.363290e+12
## 2       4637.72 5.502500e+11               2                      NA
## 3        646.87 1.090433e+11               3            1.090433e+11
## 4          1.00 7.496850e+10               4                      NA
## 5        237.61 7.220240e+10               5                      NA
## 6          2.05 6.599064e+10               6            9.260720e+10
##   total_volume high_24h    low_24h price_change_24h price_change_percentage_24h
## 1  28973811611 65739.00 6.3636e+04     890.32000000                     1.39114
## 2  13425127376  4727.90 4.6045e+03      -9.50330430                    -0.20449
## 3   2104679872   658.88 6.2185e+02      19.74000000                     3.14757
## 4  57548483221     1.01 9.9845e-01       0.00020127                     0.02007
## 5   2066938639   241.74 2.2568e+02      11.38000000                     5.02864
## 6   1287627002     2.08 2.0300e+00       0.00176077                     0.08587
##   market_cap_change_24h market_cap_change_percentage_24h circulating_supply
## 1           16052323565                          1.32759           18872681
## 2           -1127796503                         -0.20454          118333501
## 3            3358724421                          3.17807          168137036
## 4             -57427235                         -0.07654        74751886939
## 5            3664026291                          5.34595          303212739
## 6             150480851                          0.22855        32066390668
##   total_supply  max_supply      ath ath_change_percentage
## 1     21000000    21000000 69045.00              -6.06181
## 2           NA          NA  4878.26              -4.73427
## 3    168137036   168137036   686.31              -5.53961
## 4  74751886939          NA     1.32             -24.45706
## 5    508180964          NA   259.96              -8.42581
## 6  45000000000 45000000000     3.09             -33.46931
##                   ath_date         atl atl_change_percentage
## 1 2021-11-10T14:24:11.849Z 67.81000000          9.555014e+04
## 2 2021-11-10T14:24:19.604Z  0.43297900          1.073234e+06
## 3 2021-05-10T07:24:17.097Z  0.03981770          1.628040e+06
## 4 2018-07-24T00:00:00.000Z  0.57252100          7.457964e+01
## 5 2021-11-06T21:54:35.825Z  0.50080100          4.743495e+04
## 6 2021-09-02T06:00:10.474Z  0.01925275          1.056727e+04
##                   atl_date roi.times roi.currency roi.percentage
## 1 2013-07-06T00:00:00.000Z        NA         <NA>             NA
## 2 2015-10-20T00:00:00.000Z  94.77724          btc       9477.724
## 3 2017-10-19T00:00:00.000Z        NA         <NA>             NA
## 4 2015-03-02T00:00:00.000Z        NA         <NA>             NA
## 5 2020-05-11T19:35:23.449Z        NA         <NA>             NA
## 6 2020-03-13T02:22:55.044Z        NA         <NA>             NA
##               last_updated
## 1 2021-11-14T04:13:52.400Z
## 2 2021-11-14T04:14:12.483Z
## 3 2021-11-14T04:14:27.668Z
## 4 2021-11-14T04:12:55.331Z
## 5 2021-11-14T04:14:48.889Z
## 6 2021-11-14T04:14:09.346Z

5.3 Pulling Twitter Tweets

We pull tweets from Twitter, ensuring that we pull tweets for all cryptocurrencies that is supplied in a list. This step makes use of the pre-packaged rtweet package, which has already packaged the API connections necessary into simple and easy-to-use R functions. In our analyses here, we include both Ethereum and Bitcoin cryptocurrencies.

# Get vector of dates from today to 7 days ago, due to Twitter API limitation
dates <- seq.Date(Sys.Date() - 6 , Sys.Date(), by="days")

# List of cryptocurrencies to search, this will be passed in as an input list in the shiny app function
crypto <- list("ethereum", "bitcoin")
# Create empty dataframe 
tweets <- data.frame()

# Search for tweets by topic, removing retweets to reduce duplicates impacting results
for (coin in crypto) {
  for (i in seq_along(dates)) {
    search <- search_tweets(coin, n=100, include_rts = FALSE,  until= dates[i])
    search_text <- search %>% select(screen_name, created_at, text, )
    search_text$currency <- coin
    tweets <- rbind(tweets, search_text)
  }
}

# Sort by coin
tweets %>% arrange(currency)
## # A tibble: 1,342 × 4
##    screen_name     created_at          text                             currency
##    <chr>           <dttm>              <chr>                            <chr>   
##  1 zoe_squonk      2021-11-07 23:59:56 "@BaneNook first mayor to go to… bitcoin 
##  2 SirLRonHODLer   2021-11-07 23:59:53 "$ENJ is going full cup and han… bitcoin 
##  3 Crypto88888     2021-11-07 23:59:53 "How Bitcoin Has Performed Comp… bitcoin 
##  4 Crypto88888     2021-11-07 23:59:52 "Bitcoin Demand Trends Downward… bitcoin 
##  5 MyPrettyAzz     2021-11-07 23:59:52 "App that lets you earn Bitcoin… bitcoin 
##  6 trading_alerts_ 2021-11-07 23:59:50 "🔴🟡🟢ALERTA🔴🟡🟢\nCierre y a… bitcoin 
##  7 ShootsMissesNFT 2021-11-07 23:59:50 "@raritytools @NFTBillionaires … bitcoin 
##  8 BitcoinFeesCash 2021-11-07 23:59:49 "Current Bitcoin transaction fe… bitcoin 
##  9 CryptoPricing   2021-11-07 23:59:49 "Bitcoin Price (USD): 63301.52 … bitcoin 
## 10 James_PeterG    2021-11-07 23:59:45 "@w_s_bitcoin What on earth are… bitcoin 
## # … with 1,332 more rows

5.4 Perform pre-processing on the tweets

Tweets are often extremely dirty and contain words that add no value for analysis (i.e. emojis). Hence, proper pre-processing is extremely important for tweets to be as clean and useful for further analysis. Here, we adapt the methodology that is briefly discussed in https://stackoverflow.com/questions/31348453/how-do-i-clean-twitter-data-in-r

# Perform pre-processing to ensure sentiment analysis is as accurate as possible
clean_tweets <- function(x) {
  x %>%
          # Remove URLs
          str_remove_all(" ?(f|ht)(tp)(s?)(://)(.*)[.|/](.*)") %>%
          # Remove mentions e.g. "@my_account"
          str_remove_all("@[[:alnum:]_]{4,}") %>%
          # Remove hashtags
          str_remove_all("#[[:alnum:]_]+") %>%
          # Replace "&" character reference with "and"
          str_replace_all("&amp;", "and") %>%
          # Remove punctuation
          str_remove_all("[[:punct:]]") %>%
          # Remove "RT: " from beginning of retweets
          str_remove_all("^RT:? ") %>%
          # Replace any newline characters with a space
          str_replace_all("\\\n", " ") %>%
          # Make everything lowercase
          str_to_lower() %>%
          # Remove any non alphabetical characters, including numbers
          str_replace_all("[^[:alpha:]]", " ") %>%
          # Remove non-english characters
          iconv(from="UTF-8", to="ASCII", sub="") %>% 
          # Remove unnecessary whitespace in between words
          str_replace_all("\\s+"," ") %>%
          # Remove any trailing whitespace around the text
          str_trim("both") 
    }

# Clean text with above function
tweets$text <- tweets$text %>% clean_tweets 

# Extract date from datetime of created_at
tweets$created_at <- as.Date(strftime(tweets$created_at, format="%Y-%m-%d"))

# Remove any duplicate tweets
tweets <- tweets[!duplicated(tweets$text), ]

# Remove any NA rows
tweets <- tweets %>% na.omit

# Assign incrementing index column to keep track of tweets
start <- 1
tweets$tweet_id <- 0
for (i in 1:nrow(tweets)) {
  tweets[i,"tweet_id"] <- start
  start <- start + 1
}

# Split string by word into separate rows
tweets_unnested <- tweets %>% unnest_tokens(word, text)

# Remove any stop words
tweets_unnested <- tweets_unnested %>% anti_join(stop_words) %>% rename(author = screen_name, text = word)

# Remove any two characters and below words, since they will not be very meaningful
tweets_unnested <- tweets_unnested[nchar(tweets_unnested$text) >= 3,]
tweets_unnested
## # A tibble: 7,921 × 5
##    author        created_at currency tweet_id text    
##    <chr>         <date>     <chr>       <dbl> <chr>   
##  1 SoriceMichael 2021-11-08 ethereum        1 purchase
##  2 SoriceMichael 2021-11-08 ethereum        1 multiple
##  3 SoriceMichael 2021-11-08 ethereum        1 ethereum
##  4 SoriceMichael 2021-11-08 ethereum        1 gas     
##  5 SoriceMichael 2021-11-08 ethereum        1 fees    
##  6 SoriceMichael 2021-11-08 ethereum        1 worst   
##  7 NOTshireHODL  2021-11-08 ethereum        2 ethereum
##  8 alexgranados  2021-11-08 ethereum        3 cuando  
##  9 alexgranados  2021-11-08 ethereum        3 toca    
## 10 alexgranados  2021-11-08 ethereum        3 pagar   
## # … with 7,911 more rows

5.5 Perform 3D Network Visualisation of Connected Cryptocurrencies

Now that all of the tweets have been unnested into individual words, it is insightful to also visualise the connections of the various cryptocurrencies with one another. Sometimes, investors want to be able to find out what cryptocurrencies are related to one another, such that when one cryptocurrency does well, the linked cryptocurrency is likely able to ride on the connected peer’s trajectory. The gives the investor an opportunity to also invest in the connected coin as well. These cryptocurrencies are often discussed within the same social media conversations, and it is insightful for us to plot a network map with the Javascript graphing tool visNetwork, by filtering out instances of other cryptocurrency coins that appear within the tweets.

When plotting the network map, we take care to ensure no self-loops, and also attempt to conduct matching on both the cryptocurrency’s symbol (i.e. eth, btc) and its full name (i.e. ethereum. bitcoin).

# Expand on data of top 20 cryptocurrencies to get as many matching names as possible
duplicate <- data
duplicate <- duplicate %>% select(id) %>% mutate(matching_name = id)
matchingdf <- data %>% select(id, symbol) %>% rename(matching_name = symbol) %>% bind_rows(duplicate)

# Create nodelist for network visualisation
nodes <- matchingdf %>% rename(name_of_coin = id) %>% rowid_to_column("id") %>% select(id, name_of_coin) %>% head(20)

network <- tweets_unnested

# Remove self-loops (currency maps back to itself)
mergecurrency <- function(currency, text) {
  df <- matchingdf %>% filter(id == currency)
  return(text %in% df$matching_name)
}

network <- network %>% select(text, currency) %>% filter(text %in% matchingdf$matching_name) %>% select(2, 1) %>% group_by(currency, text) %>% summarize(count = n())

network$self_loop = FALSE

for (i in 1:nrow(network)) {
  network[i, "self_loop"] = mergecurrency(network[i, "currency"]$currency, network[i, "text"]$text)
}

network <- network %>% filter(!self_loop & count > 1) %>% select(currency, text, count) %>% rename(from = currency, to = text, frequency = count)
network <- merge(network, matchingdf, by.x='to', by.y='matching_name') %>% select(from, id, frequency) %>% rename(to = id) %>% group_by(from, to) %>% summarize(frequency = sum(frequency))
network <- merge(network, nodes, by.x='from', by.y='name_of_coin')  %>% select(id, to, frequency) %>% rename(from = id)
network <- merge(network, nodes, by.x='to', by.y='name_of_coin')  %>% select(from, id, frequency) %>% rename(to = id) %>% arrange(from)

# Make a palette of 3 colors
col  <- brewer.pal(3,"Blues")
nodesize<- nodes
for(id in c(1:20)){
  nodesize$size[id]<-network %>% filter(from == id | to==id) %>%  summarise(size=sum(frequency))
}
nodesize$size<- unlist(nodesize$size)
# Create a vector of color
my_color<- col[as.numeric(cut(nodesize$size, breaks=3))]
sizingcat<- as.numeric(cut(nodesize$size, breaks=3))

networkedges<- data.frame(from = network$from, to = network$to, 
                          label=as.character(network$frequency),
                          length = 300,
                          smooth = TRUE)

visnodes<- data.frame(
  id= nodes$id,
  label = nodes$name_of_coin,                              
  shape = "circle", 
  font.size=sizingcat*9,
  color = my_color)
visnodes <- visnodes %>% filter((id %in% c(network$from,network$to)))
visnodes$shadow <- TRUE # Nodes will drop shadow

# 3D Network Map
visNetwork(visnodes, networkedges, width = "100%", main="Connections between Crypto Coins") %>%
#adding the option to choose coins from a drop down list and to also highlight the chosen coin and its relationships
visOptions(highlightNearest =  list(enabled = TRUE, algorithm = "hierarchical", degree = list(from = 0, to = 1)), nodesIdSelection = list(enabled = TRUE)) %>%
  visEvents(type = "once", startStabilizing = "function() {
    this.moveTo({scale:0.5})}") %>%
  visPhysics(stabilization = FALSE)

Figure 5

As we can see from the network map above plotted for both Ethereum and Bitcoin, there is an easy to see visualisation of the other cryptocurrencies that appear within the tweets of these two coins. The edge weights represent the frequency that these cryptocurrencies appear within the tweets of their connected peers (in this case Bitcoin / Ethereum). Clearly, we can see here that Bitcoin and Ethereum are highly connected to each other in social discussions, and it is also reasonable since their prices move in tandem with each other majority of the time. It is also possible to filter the cryptocurrencies that appear, by using the dropdown provided.

5.6 Determine Sentiment Polarity

Now that we have pulled and processed the tweets, we proceed to data visualisation. We need to first assign each word in each tweet with a sentiment score. Sentiment polarity is calculated with the following steps:

  1. Using a sentiment library (afinn), assign a sentiment score to each recognised word after the tweets have been unnested into individual words. Sentiment scores can be positive or negative, with 0 being neutral. The number of recognised words will be much fewer than the original dataset, as the sentiment dictionary is of a fixed size.
  2. Sum up all of the sentiment scores for words that are tagged to the same tweet id. This will be the overall sentiment score assigned to the entire tweet itself.
# Get sentiment dictionary. Here, we are using the afinn dictionary, as it allows us to assign specific polarity scores.
sentiment_dataset <- get_sentiments("afinn")
# Merge with the sentiment dataset to obtain our sentiment scores for each word
afinn_sentiment <- tweets_unnested %>% rename(word = text) %>% merge(sentiment_dataset, by = 'word')
# Sum up sentiment scores to get the sentiment polarity for entire tweet
afinn_sentiment <- afinn_sentiment %>% arrange(tweet_id) %>% group_by(tweet_id) %>% summarize(sentiment_polarity = sum(value)) %>%  merge(tweets, by = 'tweet_id') %>% select(tweet_id, currency, sentiment_polarity, screen_name, created_at, text)
head(afinn_sentiment)
##   tweet_id currency sentiment_polarity   screen_name created_at
## 1        1 ethereum                 -3 SoriceMichael 2021-11-08
## 2        4 ethereum                 -1 markusoreal32 2021-11-08
## 3        7 ethereum                  0     BulldudeC 2021-11-08
## 4       17 ethereum                  5   THE_MAGNATE 2021-11-08
## 5       19 ethereum                  2      RixxTech 2021-11-08
## 6       20 ethereum                  2      RixxTech 2021-11-08
##                                                                                                                                                                                                                 text
## 1                                                                                                                                      how can i go purchase are there multiple ways ethereum gas fees are the worst
## 2 given the amount of evidence uncovered proving is a security its hard to believe that the fbi are not acting or addressing the public maybe there collecting more evidence to bury these scammers wishful thinking
## 3                                                                                                                        last h top and flop cryptos to watch kda uma elon hnt lrc avax stz xrp enj shib sol btc eth
## 4                                                                                                                       another beautiful and strong w candle just need a little power from the bulls to fix the ltf
## 5                                                                                                                                                                                             hour top movers report
## 6                                                                                                                                                                                            daily top movers report

5.7 Visualizing overall sentiment polarity by Tweet ID

We start off with visualising the sentiment polarity of tweets by tweet ids. Tweet ids are the unique indexes that we assign to each tweet, in order to differentiate them from one another.

Besides plotting the polarity of each tweet, we included the following lines for reference:

  1. A smoothed line to indicate the general trend of sentiment polarity of the tweets. If this smoothed line is above the x-axis, then it is likely that overall sentiment polarity is more positive. The converse applies if the line is below the x-axis.
  2. The mean sentiment polarity of all tweets is also captured as a dotted blue line.
  3. The 25th and 75th percentile Sentiment polarity (median +- 1.96(IQR)) is also shown as a solid and dotted red line respectively. Any tweets with sentiment polarity scores beyond any of these two lines are likely outliers / extreme values.

Due to the use of the interactive plotly library, we are also able to enable mouse-over pop-ups, and information about the content of each tweet as well as the user is displayed.

label_wrap <- label_wrap_gen(width = 60)

afinn_sentiment_formatted <- afinn_sentiment %>% filter(currency == "ethereum") %>% mutate(formatted_tweet = str_glue("Tweet ID: {tweet_id}
                                      Screen Name: {screen_name}
                                      Text: {label_wrap(text)}"))

gplot <- afinn_sentiment_formatted %>%
  ggplot(aes(tweet_id, sentiment_polarity)) +
  geom_line(color = "#2c3e50", alpha = 0.5) +
  geom_point(aes(text = formatted_tweet), color = "#2c3e50") +
  geom_smooth(method = "loess", span=0.25, se=FALSE, color="blue") +
  geom_hline(aes(yintercept = mean(sentiment_polarity), linetype = "Mean Sentiment Polarity"), color="blue") +
  geom_hline(aes(yintercept = median(sentiment_polarity) + 1.96 * IQR(sentiment_polarity), linetype = "75th Percentile Sentiment Polarity"), color="red") +
  geom_hline(aes(yintercept = median(sentiment_polarity) - 1.96 * IQR(sentiment_polarity), linetype = "25th Percentile Sentiment Polarity"), color="red") +
  theme_minimal() + 
  theme(legend.direction = "horizontal", legend.background = element_rect(fill = "white", colour = "gray30")) +
  labs(title = "Sentiment Polarity for Ethereum by Tweet ID", x = "Tweet ID", y="Sentiment Polarity")

ggplotly(gplot, tooltip = "text") %>% layout(legend = list(orientation = 'h', xanchor = "center", x = 0.5, y= -0.7), xaxis = list(rangeslider = list(type="date")))

Figure 6

With this information, users can gain insights on whether the sentiments derived from the tweets are consistent across different Twitter users. If there are volatile fluctuations in the trend-line, then it is likely that opinion on a cryptocurrency could be different, and hence hold off buying until there is a more streamlined and consistent positive trend.

5.8 Visualizing overall sentiment polarity by Date

Next, it is also useful if users are able to see how the overall mean sentiment polarity for all tweets change over the most-recent 7-day period. We adopt the same methodology outlined above for visualization by tweet id, except that grouping is done by the date instead.

Due to the use of the interactive plotly library, we are also able to enable mouse-over pop-ups, and information about the mean sentiment polarity of each day is displayed.

mean_over_time <- afinn_sentiment %>% group_by(currency, created_at) %>% summarize(mean_sentiment = mean(sentiment_polarity)) %>% mutate(formatted_text = str_glue("Currency: {currency}
          Date: {created_at}
          Mean Sentiment Score: {round(mean_sentiment, 2)}"))

gplottime <- mean_over_time %>% ggplot(aes(x=created_at, y=mean_sentiment)) +
  geom_line(aes(colour=currency), size=2, alpha=0.9) +
  geom_point(aes(text = formatted_text), color = "#2c3e50") +
  theme_minimal() +
  theme(plot.title = ggplot2::element_text(face = "bold")) +
  labs(
    x = NULL, y = NULL,
    title = "Mean Sentiment Score of Tweets",
  )

ggplotly(gplottime, tooltip = "text") %>% layout(title = list(text = paste0('Mean Sentiment Score of Tweets',
                                                                            '<br>',
                                                                            '<sup>',
                                                                            paste("Tweets collected from", (Sys.Date() - 6) , "to" , Sys.Date()))), 
                                                 xaxis = list(rangeslider = list(type="date")))

Figure 7

This not only allows users to perform comparisons between the sentiments of various cryptocurrencies, but also gain insight over the likely movement directions of these sentiments. Users can observe historical averaged sentiments over the past 7 days to make relative comparisons of today’s sentiment scores versus the past.

For our final shinyApp output, we also need to be able to pull and display the most recent day’s mean sentiment scores, and compare with that of the previous day to determine if there was an increase or a fall in sentiment scores. Hence, we will define the function to achieve that here.

mean_over_time <- afinn_sentiment %>% group_by(currency, created_at) %>% summarize(mean_sentiment = mean(sentiment_polarity)) %>% mutate(formatted_text = str_glue("Currency: {currency}
          Date: {created_at}
          Mean Sentiment Score: {round(mean_sentiment, 2)}"), yesterday_sentiment = dplyr::lag(mean_sentiment, n = 1, default = NA)) %>% arrange(desc(created_at)) %>% slice_head() %>% select(mean_sentiment, yesterday_sentiment) %>% mutate(mean_sentiment = round(mean_sentiment, 2), up = mean_sentiment > yesterday_sentiment, diff = round(((mean_sentiment - yesterday_sentiment)/yesterday_sentiment)*100, 2)) %>% arrange(currency)
head(mean_over_time)
## # A tibble: 2 × 5
## # Groups:   currency [2]
##   currency mean_sentiment yesterday_sentiment up     diff
##   <chr>             <dbl>               <dbl> <lgl> <dbl>
## 1 bitcoin            1.56               1.5   TRUE      4
## 2 ethereum           1.8                0.667 TRUE    170

5.9 Most frequent words and wordcloud

Having observed sentiment polarity trends, potential investors will be keen to know about words that appear very frequently in the tweets that we have analyzed. Hence, we want to pick out the words that appear with the highest frequency in our tweets.

We also decide that this frequency calculation is done individually for each cryptocurrency, as combined analysis with other cryptocurrency coins will likely render the output not as focused or relevant to any coin in particular.

frequentdf <- tweets_unnested %>% filter(currency == "ethereum")
frequentdf %>%
  count(text, sort = TRUE) %>%
  top_n(15) %>%
  mutate(text = reorder(text, n)) %>%
  ggplot(aes(x = text, y = n)) +
  geom_col() +
  theme(axis.text.y = element_text(size=14), text = element_text(size=12)) + 
  coord_flip() +
  labs(x = "Words",
  y = "Frequency of unique words",
  title = "Most frequent words found in tweets")
Figure 8

Figure 8

We can also visualize the same information, but in a wordcloud.

set.seed(1234)
wordclouddf <- tweets_unnested %>% filter(currency == "ethereum") %>% group_by(text) %>% summarize(frequency = n()) %>% rename(words = text) %>% filter(frequency >= 10) 
wordclouddf <- wordclouddf %>% rename(word = words, freq = frequency)
wordcloud2(data = wordclouddf, size=3, color="random-light", backgroundColor="white")

Figure 9

6. Deployment as Shiny App

We deployed all of the features above onto our Shiny App. Users can select up to 3 of the top 20 cryptocurrencies that they would like to analyze, and see the same graphs and charts that have been presented above. We decided not to make the input field here reactive, as the Twitter and subsequent API calls are extremely resource-intensive, and take a significant time to load if they reacted to each change. Hence, users have to click on the “Run” button in order to begin analyses.

Figure 10

Figure 10

Furthermore, users can also toggle to the second tab to see the wordcloud, as well as sentiment plots. They do require some time to load, as data is pulled in real-time from Twitter’s servers.

Figure 11

Figure 11

6.1 Reddit 3D Network Graph

Since the codes of Reddit sentiment analysis are relatively similar to that of Twitter’s, we have added it at the appendix. In order to add value to Twitter’s analysis, we embed a plotted 3d Network Visualisation using all of the top 20 cryptocurrencies. This can only be carried out for Reddit, as Twitter as strict API rate limits.

6.2 3D Network Visualisation for all top 20 cryptocurrencies

Figure 12

6.3 Deployment as Shiny App

Reddit is also relied on as a source for us to gather information about a user’s conversation. Here, we have outlined how the Reddit page will look like once deployed.

Figure 13

Figure 13

7. Technical Analysis

Having conducted sentiment analysis, which serves as an important indicator of social-media interest for cryptocurrency coins, it is still critical to conduct foundational technical analysis to analyse the coins with sound financial principles. This provides a different perspective to purely qualitative analysis, and provides a more quantitative approach to gauging the buying / selling conditions of various cryptocurrencies.

7.1 Loading Data

We start off with first using the same Coingecko API to pull current market information of cryptocurrencies.

# We make use of Coingecko's public API endpoints in order to pull all current market information for all of Coingecko's supported cryptocurrencies
url <- "https://api.coingecko.com/api/v3/coins/markets?vs_currency=usd"
data <- fromJSON(url)
market_data <- as.data.frame(data)
market_data <- data %>% select(symbol, id, current_price, market_cap, total_volume, high_24h, low_24h, price_change_24h, price_change_percentage_24h)
head(market_data)
##   symbol          id current_price   market_cap total_volume high_24h
## 1    btc     bitcoin      64889.00 1.225187e+12  28973811611 65739.00
## 2    eth    ethereum       4637.72 5.502500e+11  13425127376  4727.90
## 3    bnb binancecoin        646.87 1.090433e+11   2104679872   658.88
## 4   usdt      tether          1.00 7.496850e+10  57548483221     1.01
## 5    sol      solana        237.61 7.220240e+10   2066938639   241.74
## 6    ada     cardano          2.05 6.599064e+10   1287627002     2.08
##      low_24h price_change_24h price_change_percentage_24h
## 1 6.3636e+04     890.32000000                     1.39114
## 2 4.6045e+03      -9.50330430                    -0.20449
## 3 6.2185e+02      19.74000000                     3.14757
## 4 9.9845e-01       0.00020127                     0.02007
## 5 2.2568e+02      11.38000000                     5.02864
## 6 2.0300e+00       0.00176077                     0.08587

7.2 Current Market Capitalisation

We plot a tree-map, that illustrates the current market capitalisation of each coin relative to one another. We know that the market capitalisation of the coins can change, and is extremely volatile, just like how we saw the Shiba Inu coin surpassing a more established Dogecoin in just 9-10 months. Hence, it is important to be able to keep track of the relative proportions of each coin, in relation to the wider cryptocurrency market.

The treemap mainly outlines the most relevant cryptocurency coins, ordered by their market capitalisation, which represents the total value of a cryptocurrency (Price of a Cryptocurrency * Total number of cryptocurrency coins in circulation). The market capitalisation also gives us an idea of the coin’s stability. Cryptocurrencies with larger market capitalisations are generally more stable.

df1 <- na.omit(market_data[,c('id','market_cap')])
df1$market_cap <- as.numeric(df1$market_cap)
df1$formatted_market_cap <-  paste0(df1$id,'\n','$',format(df1$market_cap,big.mark = ',',scientific = F, trim = T))
treemap(df1, index = 'formatted_market_cap', vSize = 'market_cap', title = 'Cryptocurrency Market Cap', fontsize.labels=c(20, 10), palette='RdYlGn', algorithm = "squarified", aspRatio = 2)
Figure 14

Figure 14

7.3 Historical Market Capitalisation

What if users want to view historical market capitalisation data, for the cryptocurrencies that he/she is keen on? We also pull that information from Coingecko, and plot a line chart. This allows users to visualise how the value of a cryptocurrency has changed over time.

coinids = list("ethereum", "bitcoin")
historical_market_df = data.frame()
for (coin in coinids) {
  historical_market <- coin_history(coin,
                                    vs_currency = "usd",
                                    180,
                                    interval = "daily",
                                    max_attempts = 3
                                   )
  historical_market_df <- rbind(historical_market_df, historical_market) 
}

# Market capitalisation is converted to log10 USD millions for ease of comparison
options(scipen = 999)
cbbPalette <- c("#CC79A7", "#0072B2", "#D55E00")
plot <- historical_market_df %>% mutate(date = as.Date(lubridate::date(timestamp)), coin_id = factor(coin_id), market_cap = market_cap / 1000000) 
  plot %>% ggplot(aes(x=date, y=log10(market_cap), colour=coin_id)) + geom_line() + scale_colour_manual(values=cbbPalette) +
  scale_x_date(date_labels="%b %y",date_breaks  ="1 month") +
  theme(text = element_text(size=13)) +
  labs(title="Market Capitalisation Over Time",
         x ="Date", y = "Market Capitalisation (USD millions)")
Figure 15

Figure 15

7.4 Plotting Technical Analysis Charts

In addition, it is important for investors to understand if a cryptocurrency is in an overbought or oversold condition. Investors will also typically find it useful to be able to gain insights into the future price movements of cryptocurrencies. Hence, we plot a list of technical indicators to further add value for investors to be able to more holistically assess the viability of any investment into a cryptocurrency. Some of the technical indicators that we display are explained below.

Bollinger Bands: They are the two dotted red lines which serve as an indication of whether the cryptocurrency is in overbought / oversold conditions. If price deviates too significantly from the mean, then it is likely to correct itself.

  • Below the lower band: Crypto is oversold and should rebound.
  • Above the upper band: Crypto is overbought and is due for a pullback.

Simple Moving Average: It is the blue and red lines that calculates the average of a selected range of prices by 20 and 30 day intervals respectively. The SMA helps to see if prices will continue its trend or reverse.

Relative Strength Index (RSI): RSI measures the magnitude of recent price changes.

  • >70: Crypto is overbought and primed for a reversal.
  • <30: Crypto is oversold / underbought and could rebound.

Scholastic Momentum Index: The SMI measures the momentum of closing prices. The price of crypto closes at its high in a market with an uptrend, and similarly, closes at its low in a market with a downtrend.

The charting makes use of the chartSeries function that is available in the quantmod package.

options("getSymbols.warning4.0"=FALSE)
myPars <- chart_pars()
myPars$mar <- c(3, 2, 1, 1)
myPars$cex <- 5 # Increase font size of both x and y axis scale ticks
x <- getSymbols('eth-USD', auto.assign=FALSE) %>% na.omit() %>% na.approx()
chartSeries(x, pars=myPars, type="auto", subset = 'last 3 months', theme = chartTheme("white"), log.scale = FALSE, TA=c(addVo(), addSMA(n=20, col='blue'), addSMA(n=30, col="red"), addBBands(n=20, sd=2, maType="SMA"), addRSI(n=14, maType="EMA", wilder=TRUE),addSMI(n=13,slow=25,fast=2,signal=9,ma.type="EMA")))
Figure 16

Figure 16

7.5 Optimizing Portfolio Returns (Determine weights of the different portfolios)

Sometimes, investors might be keen on diversifying their portfolio by accumulating different cryptocurrency coins. It will be valuable, if they can gain insights into how best to optimize their overall portfolio, based on the relative proportions of each coin held. For example, if one intends to hold both Ethereum and Bitcoin, then the optimal allocation proposed will be 50-50. Here, leverage the particle swarm optimization method (pso) that makes use of particles that move around a search space to uncover an optimal allocation of cryptocurrencies in one’s portfolio.

tickers <- c("btc-usd", "MATIC-usd", "xrp-usd", 'eth-usd', 'ada-usd')

portfolioPrices <- NULL
for(ticker in tickers) {
  portfolioPrices <- cbind(portfolioPrices,
                           getSymbols.yahoo(ticker, auto.assign=FALSE)[,4])
}

portfolioReturns <- na.omit(ROC(portfolioPrices))

portf <- portfolio.spec(colnames(portfolioReturns))

portf <- add.constraint(portf, type="weight_sum", min_sum=1, max_sum=1)
portf <- add.constraint(portf, type="box", min=.10, max=.40)
portf <- add.objective(portf, type="return", name="mean")
portf <- add.objective(portf, type="risk", name="StdDev")

optPort <- optimize.portfolio(portfolioReturns, portf, optimize_method = "pso", trace=TRUE)
chart.Weights(optPort)
Figure 17

Figure 17

This are the portfolio weights that have been predicted. They can also be visualised as a data frame below:

as.data.frame(extractWeights(optPort))
##                 extractWeights(optPort)
## BTC.USD.Close                 0.4048125
## MATIC.USD.Close               0.1015245
## XRP.USD.Close                 0.1375832
## ETH.USD.Close                 0.2458509
## ADA.USD.Close                 0.1102288
ef <- extractEfficientFrontier(optPort, match.col = "StdDev", n.portfolios = 25,
                         risk_aversion = NULL)

chart.EfficientFrontier(ef,
                        match.col = "StdDev", n.portfolios = 25, xlim = NULL, ylim = NULL,
                        cex.axis = 0.8, element.color = "darkgray", main = "Efficient Frontier",
                        RAR.text = "SR", rf = 0, tangent.line = TRUE, cex.legend = 0.8,
                        chart.assets = TRUE, labels.assets = TRUE, pch.assets = 21,
                        cex.assets = 0.8)
Figure 18

Figure 18

From here, we can also see the efficient frontier that has been plotted which represents the set of optimal portfolios that offer the highest expected return for a specific level of risk. Any point that lies below the efficient frontier is not optimal as they do not deliver on sufficient returns given the level of risk.

7.6 Predicting Prices

Lastly, being able to visualise the trend of prices in the near future also delivers powerful predictive insight into the future growth prospects of cryptocurrencies. Hence, we leverage on Facebook’s prophet package in order to achieve this, and it takes in historical price data in order to generate a prediction of the next couple of months.

coin_history_df <- coin_history(coin_id = "bitcoin", vs_currency = "usd", days = "max") %>% mutate(timestamp = as.Date(timestamp)) %>% drop_na()

# Since it will difficult to visualise the changes in prices when they are very high / very low, it is viable to perform a log-transformation
ds <- coin_history_df$timestamp
y <- log(coin_history_df$price)
df <- data.frame(ds, y)
m <- prophet(df)
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
dyplot.prophet(m, forecast)

Figure 19

7.7 See other Prophet Prediction/Forecast Components

In addition to the predictions generated, we are also able to visualise other trends that were observed in historical price data.

theme_set(theme_grey())
prophet_plot_components(m, forecast)
Figure 20

Figure 20

7.8 Deployment of Technical Analysis as Shiny App

Our technical analyses is also deployed to our Shiny App, with additional features to change the plot options (e.g. theme). Users can rely on the charts to gain further insights into the movement direction of the cryptocurrency’s prices.

Figure 21

Figure 21

Figure 22

Figure 22

Prediction of future price trends, as well as portfolio optimization are also available in the Shiny App, and add more insights into investment decisions.

8. Limitations

8.1 Limited cryptocurrency sentiment data

Both Twitter and Reddit might be insufficient to fully gauge an accurate sentiment of cryptocurrencies worldwide. Individuals in different countries rely on different social media platforms, and hence, might not all congregate on Twitter and Reddit. One possibility would be to scrape additional cryptocurrency forums such as CryptoInTalk, but this would lead to an increase in processing and waiting time on the app. However, Twitter and Reddit are still significant contributors to the cryptocurrency sentiment, and would be sufficient to give users insight into the overall sentiment of cryptocurrencies.

8.2 Longer response time on our app

As we are scraping live data from multiple data sources, the response time depends on how fast the respective APIs can retrieve and return results. Twitter’s API is relatively faster than Reddit’s, and this means that users can expect a slightly longer response time when they are attempting to view Reddit sentiment data. In addition, technical analysis tools also require time to pull the necessary market and historical data from online servers. Time is then further required to process, and run predictive models in order to produce the final results. We anticipate that based on testing, the response time will still be reasonable, unless users frequently toggles between the reactive input fields, which is unlikely to occur given that most investors would have already set their sights on a few specific cryptocurrencies that they wish to analyze.

8.3 Other factors influencing cryptocurrency market

Despite our best efforts in capturing both sentiment and technical data pertaining to cryptocurrencies, they can still be influenced by a variety of other factors, such as government regulations (i.e. ban), exchange related information or due to huge buys and sells by crypto whales. However, sentiment and technical data are likely the most critical to cryptocurrency analysis, and also most beneficial to rookie traders who desire more information to supplement their investment decisions.

9. Future Plans

9.1 Producing Detailed Reccomendations

Currently due to the scale of our project, we focus mainly on the display of insightful information for investors to analyze themselves. Moving further, we could consolidate information from the multiple sources in order to produce a greater variety of portfolio recommendations for investors. This would however entail a significant bit of risk, since our recommendations would now be more highly scrutinized as to whether they are accurate in predicting actual market conditions.

9.2 Additional Technical Indicators

There are still additional technical indicators that we can include to further supplement analyses. For example, we can also explore plotting centrality scores for cryptocurrencies to determine if there are price bubbles that investors should be careful about. In addition, we can also provide an additional dashboard for investors to be able to actively track their cryptocurrency returns, based on their portfolio composition inputs.

10. Conclusion

In conclusion, we believe that we have enabled rookie investors to be able to obtain more insightful data pertaining to the cryptocurrency market through our app. Despite the limitations addressed above, and given the scale of our project, this tool will be useful in supplementing investment decisions, rather than replacing other tools altogether. As demand for cryptocurrencies continue to rise through the roof, it is expected that usage will rise in tandem, and our app can establish itself as an unique, and one of the few offerings in the market offering sentiment analyses data together with technical analyses.

11. References

Bourgi, S. (2021, July 26). 43% of Singaporeans own crypto, according to Independent Reserve Survey. Cointelegraph. Retrieved November 5, 2021, from https://cointelegraph.com/news/43-of-singaporeans-own-crypto-according-to-independent-reserve-survey.

Crawley, J. (2021, August 25). Cryptocurrency market will more than triple by 2030: Study. Coindesk. https://www.coindesk.com/markets/2021/08/25/cryptocurrency-market-will-more-than-triple-by-2030-study/. Csreinicke, C. (2021, July 20). Bitcoin’s latest slide ERASED nearly $90 billion from the cryptocurrency MARKET. what investors should consider. CNBC. https://www.cnbc.com/2021/07/20/bitcoins-slide-erased-90-billion-in-market-value-what-to-consider-.html.

Facts & Factors. (2021, April 12). At 30% CAGR, cryptocurrency market cap size VALUE surges to RECORD $5,190.62 million BY 2026, Says facts & Factors. GlobeNewswire News Room. https://www.globenewswire.com/en/news-release/2021/04/12/2208331/0/en/At-30-CAGR-CryptoCurrency-Market-Cap-Size-Value-Surges-to-Record-5-190-62-Million-by-2026-Says-Facts-Factors.html.

Iacurci, G. (2021, July 29). 13% of Americans Traded crypto in the past Year, survey finds. CNBC. https://www.cnbc.com/2021/07/23/13percent-of-americans-traded-crypto-in-the-past-year-survey-finds.html.

Kharpal, A., & Browne, R. (2021, May 19). Bitcoin plunges 30% to $30,000 at one point in wild session, RECOVERS somewhat to $38,000. CNBC. https://www.cnbc.com/2021/05/19/bitcoin-btc-price-plunges-but-bottom-could-be-near-.html.

Kiderlin, S. (2021, May 15). Millennials’ love affair with crypto is burning red-hot right now - two out of three say it’s more attractive than Ever. Here’s why they like it so much. Business Insider. https://markets.businessinsider.com/news/currencies/millennials-investing-crypto-boom-bitcoin-ether-cryptocurrency-young-investors-2021-5.

Lisa, A. (2021, June 29). Which countries are using cryptocurrency the most? Yahoo! Finance. https://finance.yahoo.com/news/countries-using-cryptocurrency-most-210011742.html#:~:text=Cryptocurrency%20and%20blockchain%20technology%20company,form%20of%20cryptocurrency%20as%20payment.

Nann, S. (2019, October 14). How does social media influence financial markets? Nasdaq. https://www.nasdaq.com/articles/how-does-social-media-influence-financial-markets-2019-10-14.

Sehl, K. (2020, July 14). Top twitter demographics that matter to social media marketers. Social Media Marketing & Management Dashboard. https://blog.hootsuite.com/twitter-demographics/.

TripleA. (2021, October 14). Global cryptocurrency ownership data 2021. TripleA. Retrieved November 5, 2021, from https://triple-a.io/crypto-ownership/.

12. Appendix

12.1 Reddit Sentiment Analysis

Since the methodology of sentiment analysis on Reddit comments is relatively similar to that of Twitter, we will briefly touch on the procedure in the Appendix. The codes will not be run to generate output, as RedditExtractoR is not supported by RMarkdown.

12.2 Pulling Reddit Comments

We pull comments from Reddit, ensuring that we pull comments from threads with the highest numbers of comments. This step makes use of the pre-packaged RedditExtractoR package, which has already packaged the API connections necessary into simple and easy-to-use R functions. However, due to a longer processing time, we limit the number of threads that are processed, in order not to lead to a huge waiting time on the user’s end.

r.urls <- find_thread_urls(
  keywords = "ethereum", sort_by = "top", subreddit = "ethereum", period = "week"
) %>% arrange(desc(comments)) %>% head(5)
r.content <- get_thread_content(r.urls$url) 
r.content$comments <- r.content$comments %>% select(author, date, upvotes, comment)
r.content$comments

12.3 Perform pre-processing on the comments

Comments are often extremely dirty and contain words that add no value for analysis (i.e. emojis) Hence, proper pre-processing is extremely important for comments to be as clean and useful for further analysis. Here, we adapt the methodology that is briefly discussed in https://stackoverflow.com/questions/31348453/how-do-i-clean-twitter-data-in-r

# Function for cleaning of Reddit data, adapted from tweet-cleaning
clean_reddit <- function(x) {
  x %>%
          # Remove URLs
          str_remove_all(" ?(f|ht)(tp)(s?)(://)(.*)[.|/](.*)") %>%
          # Remove mentions e.g. "@my_account"
          str_remove_all("@[[:alnum:]_]{4,}") %>%
          # Remove punctuation
          str_remove_all("[[:punct:]]") %>%
          # Replace any newline characters with a space
          str_replace_all("\\\n", " ") %>%
          # Make everything lowercase
          str_to_lower() %>%
          # Remove any non alphabetical characters, including numbers
          str_replace_all("[^[:alpha:]]", " ") %>%
          # Remove non-english characters
          iconv(from="UTF-8", to="ASCII", sub="") %>% 
          # Remove unnecessary whitespace in between words
          str_replace_all("\\s+"," ") %>%
          # Remove any trailing whitespace around the text
          str_trim("both")
}

# Clean text with above function
r.content$comments$comment <- r.content$comments$comment %>% clean_reddit

# Remove duplicated comments
r.content$comments <- r.content$comments[!duplicated(r.content$comments$comment), ]

# Remove empty comments
r.content$comments <- r.content$comments %>% na.omit

# Assign incrementing index column to keep track of comments
start <- 1
r.content$comments$comment_id <- 0
for (i in 1:nrow(r.content$comments)) {
  r.content$comments[i,"comment_id"] <- start
  start <- start + 1
}

# Split string by word into separate rows
reddit_unnested <- r.content$comments  %>% unnest_tokens(word, comment) %>% select(author,date, word, comment_id)

# Remove any stop words
reddit_unnested <- reddit_unnested %>% anti_join(stop_words)

# Remove any two characters and below words, since they will not be very meaningful
reddit_unnested <- reddit_unnested[nchar(reddit_unnested$word) >= 3,]
reddit_unnested

12.4 Determine Sentiment Polarity

Now that we have pulled and processed the comments, we proceed to data visualisation. We need to first assign each word in each comment with a sentiment score. Sentiment polarity is calculated in the same manner as the Twitter analysis outlined above.

# Get sentiment dictionary. Here, we are using the afinn dictionary, as it allows us to assign specific polarity scores.
sentiment_dataset <- get_sentiments("afinn")
# Merge with the sentiment dataset to obtain our sentiment scores for each word
afinn_sentiment <- reddit_unnested  %>% merge(sentiment_dataset, by = 'word')
# Sum up sentiment scores to get the sentiment polarity for entire tweet
afinn_sentiment <- afinn_sentiment %>% arrange(comment_id) %>% group_by(comment_id) %>% summarize(sentiment_polarity = sum(value)) %>%  merge(r.content$comments, by = 'comment_id') %>% select(comment_id, sentiment_polarity, author, date, upvotes, comment)
afinn_sentiment

12.5 Visualizing overall sentiment polarity by Comment ID

This is the exact same replica as what was described for Twitter in the earlier section.

label_wrap <- label_wrap_gen(width = 60)

afinn_sentiment_formatted <- afinn_sentiment %>% mutate(formatted_tweet = str_glue("Comment ID: {comment_id}
                                      Author: {author}
                                      Comment: {label_wrap(comment)}"))

gplot <- afinn_sentiment_formatted %>%
  ggplot(aes(comment_id, sentiment_polarity)) +
  geom_line(color = "#2c3e50", alpha = 0.5) +
  geom_point(aes(text = formatted_tweet), color = "#2c3e50") +
  geom_smooth(method = "loess", span=0.25, se=FALSE, color="blue") +
  geom_hline(aes(yintercept = mean(sentiment_polarity), linetype = "Mean Sentiment Polarity"), color="blue") +
  geom_hline(aes(yintercept = median(sentiment_polarity) + 1.96 * IQR(sentiment_polarity), linetype = "75th Percentile Sentiment Polarity"), color="red") +
  geom_hline(aes(yintercept = median(sentiment_polarity) - 1.96 * IQR(sentiment_polarity), linetype = "25th Percentile Sentiment Polarity"), color="red") +
  theme_minimal() + 
  theme(legend.direction = "horizontal", legend.background = element_rect(fill = "white", colour = "gray30")) +
  labs(title = "Sentiment Polarity for Ethereum by Comment ID [Reddit]", x = "Comment ID", y="Sentiment Polarity")

ggplotly(gplot, tooltip = "text") %>% layout(legend = list(orientation = 'h', xanchor = "center", x = 0.5, y= -0.5), xaxis = list(rangeslider = list(type="date")))

12.6 Visualizing overall sentiment polarity by Date

Next, it is also useful if users are able to see how the overall mean sentiment polarity for all comments change over the most-recent 7-day period. We adopt the same methodology outlined above for visualization by comment id, except that grouping is done by the date instead.

Here, we also plot an additional graph of the prices of the selected cryptocurrencies, so that users can perform a comparison between the mean sentiment changes and the closing prices.

coin_history_df <- coin_history(coin_id = "ethereum", vs_currency = "usd", days = 10)
coin_history_df$date<- substr(coin_history_df$timestamp,1,10)
coin_history_average <- coin_history_df%>% group_by(date) %>% summarise(average_price = mean(price)) %>% arrange(date)

mean_over_time <- afinn_sentiment %>% group_by(date) %>% summarize(mean_sentiment = mean(sentiment_polarity)) %>% mutate(formatted_text = str_glue("Date: {date}
          Mean Sentiment Score: {round(mean_sentiment, 2)}"))

merged_sentiment <- merge(mean_over_time, coin_history_average, by = "date")
merged_sentiment <- merged_sentiment %>% select(c(1,3,2,4)) %>% gather("type", "value", 3:4)

gplottime <- merged_sentiment %>% ggplot(aes(x=date, y=value)) +
  geom_line(size=2, alpha=0.9, color="blue", group = 1) +
  geom_point(aes(text = formatted_text), color = "#2c3e50") +
  theme(text = element_text(size=13)) +
  labs(
    x = NULL, y = NULL,
    title = "Avg Price of Coin vs Mean Sentiment Score of Comments",
    subtitle = paste("Comments collected from", (Sys.Date() - 6) , "to" , Sys.Date())
  ) + facet_wrap(~factor(type), scales="free") + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

gplottime

12.7 Most frequent words and wordcloud

Having observed sentiment polarity trends, potential investors will be keen to know about words that appear very frequently in the comments that we have analyzed. Hence, we want to pick out the top 15 words that appear with the highest frequency in our comments.

# Plotting bar graph of top 15 words, and word cloud
reddit_unnested %>%
  count(word, sort = TRUE) %>%
  head(15) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  theme(text = element_text(size=13)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
      labs(x = "Count",
      y = "Unique words",
      title = "Most frequent words found in comments")

We can also visualize the top words in comments in a wordcloud.

set.seed(1234)
wordcloud(reddit_unnested$word, min.freq=10, scale=c(3.5, .5), random.order=FALSE, max.words=100, rot.per=0.35, colors=brewer.pal(8, "Dark2"))