Analysis of cognitive warfare and information manipulation in the Israel-Hamas war 2023

Introduction

Cognitive warfare, information warfare, and disinformation are increasingly being used to undermine democratic societies. These tactics can sow discord, weaken public trust in institutions, and influence public opinion. This article explores the cognitive warfare and information manipulation of the Israel-Hamas war in 2023 with AI technology, and empirical evidence indicates that these approaches were employed even before the conflict.

Taiwan AI Labs is the first open AI research institute in Asia focused on Trustworthy AI computing, responsible AI methodologies, and delivering right-respecting solutions. Our team with a keen interest in the intersection of social media, artificial intelligence, and cognitive warfare, we are acutely aware of the profound impact of these domains on democratic societies. In this context, our objective is to provide an in-depth analysis of how cognitive warfare, information warfare, and the propagation of misinformation affect the fabric of democratic societies. For example, our team found some disinformation about the Israel-Hamas conflict in Titok in early June. 

This article used the Israel-Hamas war in 2023 to shed light on the use of artificial intelligence in unveiling the intricate web of influence campaigns that precede and accompany armed conflicts. This study investigates how various state and non-state actors strategically disseminate cognitive warfare and information operations in the digital sphere before and during conflicts. The analysis delves into the manipulation of suspicious accounts across diverse social media platforms and the involvement of foreign entities in shaping the information landscape.

This research’s findings are insightful and carry significant implications for national security, counter-terrorism efforts, and cognitive warfare strategies. Understanding the dynamics of cognitive and information warfare is paramount in countering external influences and safeguarding the integrity of democratic processes. This study serves as a foundation for future endeavors to develop observation indicators for counter-terrorism and cognitive warfare, ultimately contributing to the preservation of democratic values and the well-being of societies.

In a world where information is a potent weapon, this research endeavors to unveil the intricate strategies at play in the digital realm, shedding light on the dynamics that can potentially shape the future of democratic societies.  For example, an authoritarian regime, such as China is using generative AI to manipulate public affairs globally, especially in democratic societies [1,2]. With this foundation, we embark on a journey to comprehend, analyze, and counter the evolving landscape of cognitive warfare and information operations.

Methodology

Taiwan AI Labs uses our analytical software “Infodemic” for investigating information operations toward multiple social media platforms. The detailed algorithm information is below

Analytics data coverage and building similarity nodes between user accounts

1. Analytics data

In the scope of this research, we conducted a comprehensive analysis of 71,774 dubious user accounts sourced from a diverse array of social media platforms, including YouTube, X (Twitter), and the most significant online forum in Taiwan, PTT. We organized these suspicious accounts into 9,737 distinct coordinated groups using an advanced user clustering algorithm. 

The central aim of this scholarly investigation was to delve into the strategies employed by these coordinated groups in the realm of information manipulation, with a particular focus on their activities related to the Israel-Hamas war in 2023. Our analytical efforts encompassed the scrutiny of 6,942 news articles, 63,264 social media posts, and 942,205 comments published between August 10–October 10, 2023.

 

2. Analysis Pipeline

Figure 1: An overview of the coordinated behavior analysis pipeline

Figure 1 illustrates the analysis pipeline of this report, consisting of three components:

  • User Features Construction: We analyze and quantify the behavioral characteristics of a given user and transform the user features as user vectors.
  • User Clustering: Leveraging the user vectors, we build a network of related users and apply a community detection algorithm to identify highly correlated user groups,and categorize them as collaborative entities for further analysis.
  • Group Analysis: We delve into the operational strategies of these collaborative groups, examining aspects such as the topics they engage with, their operation methods, and their tendencies to support or oppose specific entities.

In the subsequent sections, we will provide detailed explanations of each of these components.

 

User Feature Construction

To capture user information on social forums effectively, we propose two feature sets:

  • User Behaviour Features
    Data preparation for user behavior features is a critical step in extracting meaningful insights from the given dataset, which encompasses a wide range of information related to social posts (or videos) and user interactions.
    We collected a broad spectrum of raw social data, which was then transformed into a series of columns representing user behavior features, such as the ‘destination of user interactions’ (post_id or video_id), the ‘time of user actions’, and the ‘domain of shared links by users’, and so forth. These user behavior features will be further transformed and organized for use in user similarity evaluation and clustering.
  • Co-occurrence Features
    The purpose of co-occurrence features is to identify users who frequently engage with the same topics or respond to the same articles. We employ Non-Negative Matrix Factorization (NMF) to quantify co-occurrence features among users.
    NMF is a mathematical technique used for data analysis and dimensionality reduction by decomposing a given matrix into two or more matrices in a way that all elements in these matrices are non-negative.
    Specifically, to construct the features for M users and N posts, we build an M * N dimensional relationship matrix to record each user’s engagement with various posts. Subsequently, we apply NMF to decompose this matrix, and we utilize the obtained user vectors as co-occurrence features for each user.

User Clustering

  • User Similarity Evaluation
    After completing the construction of user features, our next step involves evaluating the coordinated relationships between users. For behavioral features, we compare various behaviors between user pairs and normalize the comparison result to a range between 0 and 1. For instance, concerning user activity times, we record the activity times for each user within a week as a 7×24-dimensional matrix. We then calculate the cosine similarity between pairs of users based on their activity times. In the case of co-occurrence features, we use cosine similarity to assess the similarity between co-occurring vectors of users.
    By computing the cosine of the inclination between these vectors, we can deduce the level of similarity between user response or their actions. This tactic is notably useful in the study of social media, where it permits the collection of users according to common behavior patterns.[3] Users who have analogous cosine similarity demonstrate a highly coordinated pattern of behavior.
  • User Clustering
    After constructing pairwise similarities among users based on their respective features, for user pairs with a similarity exceeding a predefined threshold, we establish an edge between them, thereby creating a user network. Subsequently, we applied Infomap for clustering of this network. Infomap is an algorithm for identifying community structures within networks using the information flow. The detected community in this network would be considered as coordinated groups, in the following sections.

 

Group Analysis

  • Opinion Clustering
    To efficiently understand the narratives proposed by each user group, we employed a text-cluster grouping technique on comments posted by coordinated groups. Specifically, we leveraged a pre-trained text encoder to convert each comment into vectors and applied a hierarchical clustering algorithm to cluster relevant posts into the same group, which would be used in the following analysis.
  • Stance Detection and Narrative Summary
    Large Pretrained Language Models have demonstrated its utility in extracting entities mentioned within textual content while simultaneously providing relevant explanations. [5] This capability contributes to the comprehension of pivotal elements within the discourse, particularly in understanding how comments and evaluations impact these identified entities.
    In this report, we use Taiwan LLaMa for text analysis. Taiwan LLaMa is a large language model pre-trained on native Taiwanese language corpus. After evaluation, it has shown remarkable proficiency in understanding Traditional Chinese, and it excels in identifying and comprehending Taiwan-related topics and entities. To be more specific, we leverage Taiwan LLaMa to extract vital topics, entities, and organizational names from each comment. Furthermore, we request it to assess the comment author’s stance on these entities, categorizing them as either positive, neutral, or negative. This process is applied to all opinion clusters.
    Finally, we would calculate the percentage of each main topic/entity mentioned in  the opinion group, the percentage of positive/negative sentiment associated with each topic/entity, and generate summaries for each opinion cluster using LLM for facilitating data analysts in grasping the overall picture of the event efficiently.

Result

In this incident, we discovered that there’s a different concentration of information manipulation patterns from troll groups before and after the Hamas attack on October 7th:

The manipulation pattern before the Hamas attack

Tracing back to August 24th, there’s an event where Israel’s Far-Right national security Minister, Itamar Ben-Gvir stated that Israeli rights trump Palestinian freedom of movement. Ben-Gvir has admitted that his right to move around unimpeded is superior to the freedom of movement for Palestinians in the occupied West Bank, sparking outrage.

Figure 2: A beeswarm plot showing the timeline of the stories after ‘Ben-Gvir says Israeli rights trump Palestinian freedom of movement.’

* Each Circle Represents an Event related to this manipulated story
** The Size of each circle is defined by the sum of the social discussion of that Event
*** The Darker the circle is, the Higher the proportion of troll comments in the Event

 

This study clustered the manipulated comments from the coordinated troll groups into narratives on the story events above. It was then discovered that Israel was the most manipulated entity, being swayed into a negative light, with accusations pointing towards Israel as the culprit behind the tragedy in the West Bank. On the other hand, as the USA funded and supported Israel, coordinated troll groups also accused the USA of encouraging Israel’s apartheid policy.

Table 1: Aimed entities and summary of narratives manipulated by coordinated troll groups in August.

On September 19th, Israeli Prime Minister Benjamin Netanyahu arrived in the U.S. for a meeting with President Joe Biden amid unprecedented demonstrations in Israel against a planned overhaul of Israel’s judicial system. Also on the agenda is a possible U.S.-brokered deal for normalization between Israel and Saudi Arabia. This study also discovered a series of manipulations from troll groups.

Figure 3: A beeswarm plot showing the timeline of the stories after ‘Netanyahu Prepares for Much-Anticipated Meeting With Biden.’

* Each Circle Represents an Event related to this manipulated story
** The Size of each circle is defined by the sum of the social discussion of that Event
*** The Darker the circle is, the Higher the proportion of troll comments in the Event

In these events, the study identified two contrasting manipulation methods: one involving criticism of Israel and Joe Biden, and the other showing support for Israel, reflecting the overwhelming American support for the country. Regarding the majority of manipulations, coordinated troll groups primarily focused on accusing Israel of human rights abuses and operating an apartheid regime in the Israeli-Palestinian conflict. These groups also criticized Joe Biden for handling the incident, attributing it to administrative incompetence and causing a domestic economic downturn.

Table 2: Aimed entities and summary of narratives manipulated by coordinated troll groups in September

Contrary to the Hamas attack on October 7th, this study revealed that coordinated troll groups manipulated and blamed Israel for its apartheid policies and encroachment on the West Bank in August and September. This manipulation may further justify the fact that Israel is currently under attack, viewing it through the lens of information manipulation by these coordinated troll groups. Information manipulation could be a leading indicator for future conflicts.

These evidences revealed that coordinated troll groups manipulated and blamed Israel for its apartheid policies and encroachment on the West Bank in August and September. This manipulation may further justify the fact that Hamas attacked Israel on October 7th. Information manipulation by coordinated troll groups could be a leading indicator for signs of incoming conflicts in the future.

 

The manipulation pattern after the Hamas attack

AI Labs consolidates similar news articles into events using AI technology and visually represents these events on a timeline using a beeswarm plot. Within this representation, each circle signifies the social volume of an event. The color indicates the percentage of troll activity, with darker shades signifying higher levels of coordinated actions.

Following the incident on October 7, the event timeline is depicted in the figure below. Notably, after Hamas’ attack on Israel, there was a significant spike in activity, with varying degrees of coordinated actions across numerous events.

Figure 4: A beeswarm plot showing the timeline of the story events of Hamas attack

* Each Circle Represents an Event related to this manipulated story

** The Size of each circle is defined by the sum of the social discussion of that Event

*** The Darker the circle is, the Higher the proportion of troll comments in the Event

AI Labs categorizes these events based on different social media platforms, with the analysis as follows:

 

Case study 1: YouTube

From the data AI Labs collected on YouTube, there are 497 videos with 175,072 comments. Among these, 681 comments are identified as coming from troll accounts, accounting for approximately 0.389% of the total comments. There are 64 distinct troll groups involved in these operations.

The timeline for activity on the YouTube platform is as follows. The most manipulated story was “Israel’s father-to-be was called up! Before going to the battlefield, ‘the pregnant wife hides her face and cries,’ the sad picture exposed.” For this event, the level of troll activity on the YouTube platform was 15.79%.

Figure 5: A beeswarm plot showing the timeline of the most manipulated story on YouTube.

* Each Circle Represents an Event related to this manipulated story

** The Size of each circle is defined by the sum of the social discussion of that Event

*** The Darker the circle is, the Higher the proportion of troll comments in the Event

 

Utilizing advanced AI analytics, AI Labs conducted an in-depth examination of the actions taken by certain coordinated online groups. The analysis revealed that the primary focuses of these operations encompass subjects like Israel, Hamas, and Palestine. Specifically:

14% of the troll narratives aim to attack Israel, encompassing criticisms related to its historical transgressions against the Palestinian populace. Within this discourse, there is also a prevalent assertion that Western nations display a marked hypocrisy by turning a blind eye to Israel’s endeavors in Palestine. Further, there’s an emergent speculation insinuating Israel’s clandestine endeavors to instigate U.S. aggression towards Iran.

6.5% of the troll narratives direct attention to attacking Hamas. The predominant narratives label Hamas’s undertakings as acts of terrorism, accompanied by forewarnings of potential robust retaliations. Additionally, these narratives implicate Hamas in destabilizing regional peace, with indirect allusions to Iran as the potential puppeteer behind the scenes. 

5.8% of the troll narratives demonstrate a clear pro-Palestine stance. Within this segment, there’s a prevalent contention that suggests that narratives supporting Israel are double standard. Moreover, there’s an emerging narrative painting the conflict as a collaborative false flag orchestrated by Israel and the U.S. It’s noteworthy to mention that the slogan “Free Palestine” frequently punctuates these expressions.

Table 3: Aimed entities and summary of narratives manipulated by coordinated troll groups on YouTube

In addition to the aforementioned, AI Labs utilized sophisticated clustering algorithms to discern predominant narratives emanating from these troll accounts. Preliminary findings indicate that on the YouTube platform, 10.3% of these narratives uphold the sentiment that “Hamas and Palestinians who endorse Hamas are categorically terrorists.” Meanwhile, 4.5% overtly endorse Israel’s tactical responses against Gaza. Of significant interest is the 1.9% narrative subset suggesting that “Hamas’s armaments are sourced from Ukraine”—a narrative intriguingly resonant with positions articulated by Chinese state-affiliated media outlets.

Table 4: Percentage of narratives and aim entities manipulated by coordinated troll groups on YouTube.

Case study 2: X(Twitter):

AI Labs curated a dataset comprising 8,650 tweets and 61,568 associated replies. Within this corpus, replies instigated by the troll groups totaled 295, constituting approximately 0.479% of the overall commentary volume. In total, there were 34 distinguishable troll groups actively operational.

The timeline derived from the Twitter platform is presented subsequently. The events subjected to the most pronounced troll activities were “CUNY Law’s decision to suspend student commencement speakers following an anti-Israel debacle” and “Democrats cautioning Biden concerning the Saudi-Israel accord,” accounting for 9.09% and 6.50% of troll operations, respectively.

Figure 6: A beeswarm plot showing the timeline of the story events of Hamas attack on X(Twitter).

* Each Circle Represents a Event related to this manipulated story
** The Size of each circle defined by the sum of the social discussion of that Event
*** The Darker the circle is, the Higher the proportion of troll comments in the Event

 

In a comprehensive analysis conducted by AI Labs, troll groups on X(Twitter) were observed to predominantly focus their efforts on Israel, Biden, Hamas, and Iran. The narrative against Israel accounted for 20.3% of the total discourse, emphasizing the following:

20.3% of the troll narratives against Israel emphasizing that Israel has systematically perpetrated what can be characterized as genocide against the Palestinians, rendering it undeserving of financial backing from the U.S.

10.76% of the troll narratives attributed to attack Joe Biden, about that Biden’s allocation of $6 billion to Iran reflects fiscal and strategic imprudence, further suggesting that this policy facilitates Iran’s provision of arms to Hamas.

10.76% of the troll narratives(理由同前) attack Hamas, with the core narrative characterizing Hamas’ actions against Israel as acts of terrorism. Critiques against Iran made up 7.6%, predominantly accusing it of being a principal arms supplier to Hamas.

Table 5: Aimed entities and summary of narratives manipulated by coordinated troll groups on X(Twitter)

 

Leveraging clustering methodologies, AI Labs identified dominant narratives among these troll groups on X(Twitter). Approximately 11.5% of troll replies focused on the theme “Biden’s Financial Backing to Iran”, overtly criticizing Biden’s foreign policy decisions. An additional 7.3% of replies contained links underscoring the alleged disparity in casualty rates between the Gaza Strip and Israel. Moreover, 5.2% of the troll replies appeared to advocate for Trump’s Middle Eastern policies, positioning them as judicious and effective compared to current strategies.

Table 6: Percentage of narratives and aim entities manipulated by coordinated troll groups onX(Twitter).

Case study 3: PTT

PTT is a renowned terminal-based bulletin board system (BBS) in Taiwan. Our team extracted data from 312 pertinent posts and 62,308 comments on PTT. Of these comments, those attributed to coordinated groups totaled 3,613, representing 5.8% of the overall comment volume. In total, there were 110 troll groups actively participating in discussions.

The temporal analysis on PTT reveals four major incidents with significant user engagement and heightened levels of coordinated activity. These are:

“Schumer meets Wang Yi, urges Mainland China to bolster Israel amidst the Israel-Palestine clash,” with troll activity accounting for 11.06%.

“Over 150 individuals abducted! Hamas spokesperson: Without Israel’s ceasefire, hostage negotiations remain off the table,” exhibiting a troll activity rate of 11.20%.

“Persistent provocations by Hamas! What deters Israel from seizing Gaza? Experts weigh in,” with a troll activity proportion of 9.97%.

“In the Israel-Palestine Confrontation, the mastermind’s identity behind Hamas’s terrorist attacks is revealed. Having evaded numerous assassination attempts in Israel, he has been physically incapacitated and wheelchair-bound for an extended period,” which saw a troll activity rate of 9.88%.

Figure 7: A beeswarm plot showing the timeline of the story events of Hamas attack on PTT

* Each Circle Represents a Event related to this manipulated story
** The Size of each circle defined by the sum of the social discussion of that Event
*** The Darker the circle is, the Higher the proportion of troll comments in the Event

 

Utilizing AI, AI Labs delved into the narratives of troll accounts on PTT, identifying their primary targets as Israel, the Taiwanese society, and Hamas. Regarding Israel, PTT trolls discourse bifurcates into supportive and critical stances. Critical narratives against Israel constitute 10.1%, predominantly highlighting criticisms of Israel’s purported military prowess and alleged discrimination against Palestinians. On the other hand, trolls discourse championing Israel represent 8.3%, primarily lauding Israel for its robustness and solidarity. Some voices even suggest that Taiwan could draw lessons from Israel, admiring Israel’s retaliatory methodologies and military discipline.

Given PTT’s status as a localized Taiwanese community platform, it’s unsurprising that discussions also encompass the Taiwanese societal context. A notable 4% of the troll narratives is laden with critiques regarding Taiwan’s perceived lack of unity and skepticism over its capacity to withstand potential Chinese aggression. Meanwhile, 2.5% of troll narratives are atack Hamas, primarily decrying their tactics—like utilizing civilians as bargaining chips—which arguably escalate hostilities.

Table 7: Aimed entities and summary of narratives manipulated by coordinated troll groups on PTT

 

For narrative categorization, AI Labs employed clustering techniques to classify troll group discussions. Notably, 7.2% of discourse content pointed fingers at Israel, painting it as predominantly belligerent. Additionally, 4% highlighted accusations against Israel for allegedly conducting multiple raids on refugee camps in Jenin earlier that year, while 1.2% of the narrative indicated impending large-scale retaliatory actions by Israel. These threads predominantly hold Israel in a critical light. Conversely, 3.6% of discussions condemned Hamas for allegedly perpetrating terror and indiscriminate civilian harm, whereas 1.4% blamed Iran for purportedly financing Hamas.

Table 8: Percentage of narratives and aim entities manipulated by coordinated troll groups on PTT

 

Regarding to YouTube, troll entities seem to bolster their engagement and visibility by posting comments that echo narratives familiar to Chinese state-affiliated media, such as claims that “Hamas procures weapons from Ukraine” or that “China can serve as a peace-broker.”

In addition, troll accounts on X(Twitter) appear to leverage the Israel-Palestine conflict as a fulcrum to agitate domestic political discussions, predominantly spotlighting criticisms against Biden’s Iran funding policy.

Figure 8: Troll accounts comment under media tweets to agitate domestic political discussions, predominantly spotlighting criticisms against Biden’s Iran funding policy.

 

In the case of the PTT, the most pronounced troll narratives seem to portray Israel as inherently aggressive, allegedly implicating them in early-year refugee camp raids. Interestingly, diverging from tactics observed on other platforms, troll entities on PTT chiefly underscore perceived fractures in Taiwanese societal cohesion and purported vulnerabilities against potential Chinese aggression.

Aggregating troll narratives across platforms, we discern that 6% of discussions mock Israel for its purported frequent aggressions against Palestine. Approximately 3.5% advocate for Palestinian statehood, and 3.0% categorize Hamas operations as terroristic. In contrast, 2.2% of discourses support Israel’s proposed actions.

Table 9: Percentage of narratives and aim entities manipulated by coordinated troll groups on all platforms

 

Chinese state-affiliated media’s narratives and troll operations

AI Labs analyzed news from Chinese state-affiliated media outlets and observed that certain articles experienced extensive re-circulation within China. On October 8th, the Chinese Ministry of Foreign Affairs responded to the incident where Hamas attacked Israel, urging an immediate ceasefire to prevent further escalation of the situation. Later that day, a video titled “The Israel-Palestine War Has Begun! Who is the Hidden Hand Behind the Israel-Palestine Conflict?”(《經緯點評》以巴開戰了!谁是以巴戰爭的幕後黑手?) was released on the YouTube channel operated by David Zheng. This video echoed the sentiments expressed by the Chinese Ministry of Foreign Affairs, suggesting that “China can serve as a mediator between Israel and Palestine, becoming a peacemaker, thus positioning itself to have greater influence in the world.” There was noticeable activity by troll groups in the comments section of this video, aiming to amplify its reach and propagate its viewpoints.

Figure 9: Statement from the Chinese Ministry of Foreign Affairs and the YouTube video echo the narrative.

Figure 9: Statement from the Chinese Ministry of Foreign Affairs and the YouTube video echo the narrative.

Figure 10: Troll accounts comment under the video “The Israel-Palestine War Has Begun! Who is the Hidden Hand Behind the Israel-Palestine Conflict?”(《經緯點評》以巴開戰了!谁是以巴戰爭的幕後黑手?) , aiming to amplify its reach and propagate its viewpoints.

* The comment with background means the commenter is troll account, same color means they are in same group.

 

On October 10th, the Chinese Global Times cited a news piece from the Russian state-affiliated media Russia Today, which quoted Florian Philippot, the chairman of the French Patriot Party. The article highlighted allegations that weapons supplied by the US to Ukraine had surfaced in the Middle East. It was now used in violent confrontations, with the quantities being “vast!” This claim has been confirmed as false by fact-checking organization NewsGuard Technologies. This narrative also manifested within the discourse of troll groups on YouTube. About 1.9% of the discussions echoed this news item propagated by Chinese state-affiliated media on behalf of Russia. AI Labs postulates that this narrative might be a dissemination effort by China on behalf of Russia, aimed at undermining support for Ukraine from the US and its Western allies.

In early June 2023, Dr. Jung-Chin Shen, an Associate Professor at York University in Canada, observed that numerous users were discussing the Israel-Palestine conflict within the Chinese territory on the platform Douyin. In their discourse, Israel appeared to be consistently retreating, suggesting that Palestine seemed to be on the verge of achieving victory in the war.

Figure 11: In early June 2023, screenshots of videos related to the Israel-Palestine conflict on Douyin. (Source: Jung-Chin Shen’s Facebook)

 

Discussion

The findings of this article indiacte that cognitive warfare and information operations are becoming increasingly sophisticated and are being used to achieve a broader range of objectives. The research also indicates that information manipulation could be a pilot indicator for future conflicts.

Specifically, the evidence showed that troll groups manipulated by blaming Israel for their apartheid policy and encroaching on the West Bank in Aug & Sep 2023, right before the conflict began. This suggests that information manipulation was used to sow discord and weaken public trust in the Israeli government, which may have contributed to the outbreak of the conflict.

In addition, the research indicated that cognitive warfare and information operations were used to sow discord and weaken public trust in institutions. These tactics were used to exacerbate tensions between Israelis and Palestinians and to undermine the credibility of the Israeli and Palestinian governments.

These findings have implications for the future of cognitive warfare and information operations. Governments and organizations need to be aware of the threat posed by these tactics and develop strategies to combat them.

This research has implications for the future of cognitive warfare and information operations. The findings suggest that these tactics are becoming more sophisticated and used to achieve a broader range of objectives. The research also suggests that developing new strategies to combat these threats is essential. Future research can investigate how foreign actors, including Iran, Russia, and China, were involved in the conflict. These actors used social media to spread disinformation and propaganda supporting their interests.

This research is still in its early stages, but it has the potential to significantly contribute to our understanding of cognitive warfare and information operations. The findings of this research could be used to inform the development of new strategies to combat these threats and to protect democratic societies from their harmful effects.

 

Reference

[1]Tucker.P.(2023). How China could use generative AI to manipulate the globe on Taiwan. Defense One. https://www.defenseone.com/technology/2023/09/how-china-could-use-generative-ai-manipulate-globe-taiwan/390147/

[2] Beauchame-Mustafaga & Macllino.(2023). The U.S. Isn’t Ready for the New Age of AI-Fueled Disinformation—But China Is. Time. https://time.com/6320638/ai-disinformation-china/

[3] Al-Otaibi, S., Altwoijry, N., Alqahtani, A., Aldheem, L., Alqhatani, M., Alsuraiby, N., & Albarrak, S. (2022). Cosine similarity-based algorithm for social networking recommendation. Int. J. Electr. Comput. Eng, 12(2), 1881-1892.

[4] Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (Vol. 13, pp. 556-562).

[5] Covas, E. (2023). Named entity recognition using GPT for identifying comparable companies. arXiv preprint arXiv:2307.07420.

 

Illustrating Microservices in Action: A Closer Look at the Task Manager Application

In today’s rapidly evolving technological landscape, software development approaches have undergone a paradigm shift. One of the most influential trends that has emerged is the adoption of microservices architecture. Microservices architecture is a modern software design pattern that aims to build large, complex applications as a collection of small, independent, and loosely coupled services. Within this project, we have employed this design pattern to circumvent issues of race conditions within the server-side service during scalability enhancements.

The Task Manager is a component within the GenDiseak modules, tasked with handling the delegation of storage and analysis workflows (Figure 1). As the user base expands in the future, the GenDiseak service will require scaling up by deploying multiple replicas within the Kubernetes cluster to accommodate increasing requests. However, a challenge arises due to simultaneous database updates by these GenDiseak replicas, which can lead to potential race conditions (Figure 2).

Figure 1. Task Manager module in GenDiseak

Figure 2. Race conditions in replicas

To address the problem and prevent redundant task execution, the approach involves centralizing the Task Manager (Figure 3). You can visualize the Task Manager as a project manager and the GenDiseak replicas as the workforce. Upon receiving a task from a user or other processes, GenDiseak establishes a task entry in the database by interacting with the Task Manager. Subsequently, the Task Manager continuously scans for pending tasks and delegates them to the GenDiseak replicas (Figure 4).

Figure 3. Centralized Task Manager

Figure 4. Interaction between GenDiseak and Task Manager

A Cron task is under a specific task category. In the case of GenDiseak, it periodically monitors the Illumina BaseSpace for uploaded files and initiates the analysis pipeline whenever new sequencing files are detected. Let’s illustrate how many cron tasks collaborate with the Task Manager using an example. Initially, the Task Manager schedules the very first cron task. Once the scheduled time arrives, the initial task is assigned to GenDiseak for execution. Subsequently, if the task is completed, the Task Manager’s API is invoked to update the task status. This cycle continues for the subsequent tasks in a similar manner (Figure 5).

Figure 5. Cron tasks example

Within the project scope, the Sidecar pattern was employed for the Task Manager and its corresponding database. Additionally, the Strangler Fig approach was leveraged to replace the original Task Manager within the GenDiseak system. To sum up, the adoption of the microservices design pattern presents a revolutionary method in software development. This approach empowers our platform services to construct scalable applications by disassembling monolithic structures into discrete, self-contained components.

From Novice to Dual Roles in Just One Month: Immersive Internship Experience in Both Product and Engineering Positions

Infodemic PM:

I began my internship journey at AI Labs within the Infodemic department as a Product Manager intern. I spent the initial days understanding our target clients and the specific services AILabs and the Infodemic team wanted to deliver with this product. After gaining a baseline understanding of our product, I conducted a comprehensive competitive analysis to identify industry gaps and pinpoint opportunities, helping us carve out a unique value proposition.

During the next part of my internship, I attended business meetings with prospective clients from foreign countries. Based on insights from various stakeholders, potential partners, and my understanding of market needs, I crafted a product requirements document that delineated a clear user flow, ensuring our product would align with potential partners and their specific demands.

As my tenure with the Infodemic team came to a close and I transitioned into a different project, I compiled a glossary for the Infodemic.cc project. This document was intended to ensure continuity and shared understanding among team members working towards the product’s general release.

During my time as a product manager for the Infodemic team, I deeply value the insights and experience I gathered from managing different facets of the software release cycle for an inventive new product. I would like to give a special thanks to Chloe for being the best mentor I could have asked for. Her mentorship was pivotal when I was first joining AI Labs, teaching me so much and showing me what it means to be a proficient product manager.

Yating Music Studio SWE:

Upon joining the Yating Music Studio Project, I made it my first priority to understand our product’s market positioning. This involved learning about not only our own product’s offerings but also the offerings of other products offering similar services. For new entrants into this industry, distinctly differentiating their products seemed pivotal to ensuring their success, and even then, products that were unable to carve out a unique niche still seemed to struggle considerably within this industry.

With this new understanding, I spent some time testing the Music Studio application, subsequently putting together a prioritized list of items that I felt needed to be repaired or enhanced in order to realize our goals. I spent time tackling these priority items – enriching the user experience and modifying the application. In addition to tackling these priority items, I also set up test suites for the application using Google Test and Google Mock and repaired faulty source code, pushing the product closer to its general release.

I would like to extend a heartfelt thanks to Benson. He graciously welcomed me into his team, providing me with the opportunity to contribute and develop software for his project. His mentorship and trust in my ability are what allowed me to develop and grow immensely in my role.

Final Thoughts:

My internship at Taiwan AI Labs has been an incredibly enriching experience. I have had the privilege of being a part of two distinct software development teams, serving in two contrasting roles: product manager and software engineer. Contributing to a rapidly moving product from these two vantage points has provided me with invaluable insights into the challenges of each role and this dual perspective is something I will carry forward into all future projects.

For Exploring Medical Frontiers: Deep Learning for ICU Tabular Data and Image Registration

Tabular data learning
Introduction

Artificial intelligence(AI) has significantly transformed the landscape of medicine. AI  has proven its effectiveness in aiding clinicians by facilitating diagnoses, finding new treatment regimes, and even predicting disease prognosis. However, while AI models have been successful in interpreting medical images, the realm of medical tabular data, which is routinely collected for daily medical usage, remains largely unexplored and challenging for model training. Several key factors contribute to the difference between image and tabular data:

 

Contextual Information: Medical image data encapsulates a wealth of contextual information within its visual representations. On the contrary, tabular data presents a sparser landscape. Pixels within an image often exhibit intercorrelation with their neighboring pixels, and therefore image models can theoretically find anything from the images. In contrast, the values in a column of a table do not inherently bear any guaranteed relation to adjacent columns. This inherent distinction necessitates extensive domain expertise to curate meaningful feature selection and engineering before effective model training.

 

Inherent Data Noisiness: While noise within medical images can be readily identified, localized, and rectified by individuals without advanced medical knowledge, challenges arise when handling tabular data. For instance, a layman might be able to identify the values outside of the normal range of blood pressure, yet may lack the awareness to rectify data where diastolic pressure is higher than systolic pressure. Such anomalies likely result from handwriting errors during data documentation and can be intricate to correct if someone doesn’t even know it is an issue in the data.

 

Pervasive Data Sparsity: Medical data frequently presents high levels of sparsity. Tabular medical records are primarily recorded for medical purposes and patients have distinct combinations of data because of their highly different medical status. Consequently, it’s common to see that features A, B, and C documented in patient X’s medical record can’t be found in patient Y’s records. In other words, lots of missing values (around 75%-90% of the cell) will be expected to be seen in the table. 

 

In this article, we aim to address these three challenges by employing the contexts of intubation and sepsis prediction as illustrative examples. The dataset concerning intubation was fetched from the Taipei Medical University database, while the sepsis data was downloaded from the 2019 sepsis early prediction open dataset on Kaggle.

Method

Data collection:

Intubation: The intubation data was fetched and curated from the Taipei Medical University database, which resides on NAS. This dataset comprises 44 laboratory features (blood gas analysis, regular blood tests, renal profiles, metabolic exams, and liver functions) and 7 dynamic features (diastolic/systolic/mean arterial pressure, breath rate, heart rate, O2 saturation, and temperature). The curation of this data was based on medical knowledge. Extreme and unreasonable values (e.g., temperature = 370, which should be 37.0) were removed. The labels and timestamps indicating whether and when a patient was intubated were derived from the database’s billing time log. Following the patient selection pipeline, approximately 7,700 patients with qualified data were included, of which 3.9% were intubated (designated as the positive tag).

Sepsis: The sepsis dataset was downloaded directly from Kaggle. This dataset comprises 27 lab features, 7 dynamic features, and 5 static features (such as Age, Sex, ICU unit, etc.). The data was curated using medical knowledge, and extreme and unreasonable values were eliminated. Following the patient selection pipeline, approximately 40,000 patients with qualified data were included. Among them, 1.1% of the patients were labeled as positive (sepsis occurred within the stay in the ICU).

Data curation:

Data curation is one of the most challenging and pivotal tasks in tabular data learning, particularly when considering the ETL (Extract, Transform, Load) pipeline. An engineer responsible for data retrieval may not be familiar with distinctions between white blood cell (WBC) counts in urine, blood, or cerebrospinal fluid (CSF). Consequently, data might be extracted for various medical purposes and different value ranges, all of which could be placed within a single column.

 

A straightforward approach to identify this issue is plotting the data distribution before any preprocessing or post-processing steps. This provides an initial impression of the data. Generally, most medical data should exhibit a unimodal distribution, closely resembling a normal distribution, or displaying left or right skewness. For instance, consider the distribution of recorded heart rates (HR):

Although some extreme values might intrigue one’s attention, the value 280 might result from a physically possible tachycardia event and the minimal value of 28 could be the result of a dying patient. Do not eliminate data before second inspection. Let’s dig out the patients generating these extreme values.

The data in the right plot is more comprehensible. This patient exhibits occasional tachycardia, likely due to medication. On the left plot, the patient’s data is considerably more intricate. The plot displays extremely high and low HR values. However, both scenarios are observed in clinical practice. As engineers, we should not cap or discard a single data point solely based on value thresholds. Instead, if we have the luxury of losing some patients in the data, I would like to drop this patient from my training set (the patient on the left-hand side).

 

Unfortunately, numerous potential scenarios demand thorough examinations. For instance, the distribution of PTT values exhibits an unusual spike at 150 and 250 units, which corresponds to the maximum time interval set by hospitals for blood coagulation measurement. On the contrary, the abnormal distribution of calcium levels results from erroneously recording calcium values from varied measurements and units—a significant oversight. The process of data cleaning proves to be intricate and laborious. It is my belief that the optimal strategy to navigate this challenging endeavor is through close collaboration with clinical professionals.

Modeling:

There are at least two ways for modeling the prediction questions. 

  1. Static prediction (or left alignment): Based on the first 24-hour data after entering the ICUs, what’s the possibility of a patient getting sepsis during the ICU stay?
  2. Real-time prediction (or sliding window): Based on the data in the previous X-hour window, what’s the possibility of a patient getting sepsis in the next 24 hours?

XGBoost was selected for solving the left alignment problem due to its excellent ability of handling missing values. For the sliding window problem, we test the commonly-selected LSTM model for multivariate time-series prediction.

(a) Observation window: the data used for building models. Prediction window: if an event occurs in this window, the instance (patient) will be considered positive. (b) Sliding window: the observation and prediction windows will change with time evolving. (d) Left alignment: all instances have the same starting point and size of observation window. 

Image courtesy: https://www.nature.com/articles/s41746-021-00529-x

Feature augmentation and time compression:

For the left alignment task, a single instance can consist of several rows of data. One approach to summarize multiple data points within the observation window is through statistical summarization. To be more specific, we have chosen the minimum, average, and maximum values of a feature of an instance within the window to represent its behavior over this time span. This approach also offers the advantage that if a feature has at least one data point within the observation window, it will not result in troublesome missing values for the models later on.

Baseline:

The recent consensus for diagnosing sepsis is the Sepsis-3 agenda, which was published in 2016. In this agenda, a diagnosis of sepsis is considered when a person exhibits clear manifestations of infection and shows an increase in the SOFA score of 2 or more. It is worth mentioning that the Sepsis-3 consensus is only used for making current diagnoses and not for forecasting future occurrences. In our approach, we utilize a WBC level over 10,000/μl as a proxy for infection and estimate the SOFA score accordingly. We apply the Sepsis-3 criteria in an off-label manner to predict the occurrence of sepsis in the future. We then use the results as a baseline for comparison with our models.

Result

Intubation prediction, left alignment

From the table, we can see that the XGBoost model significantly outperforms random guessing (with a prevalence rate of 0.04). With this model, a doctor is capable of predicting whether a patient is highly likely to require intubation and ventilation during their ICU stay. The model performs even better when it can access the first 48 hours of data instead of only the first 24 hours. However, this improvement was not observed when we further extended the observation window from 48 to 72 hours.

 

Another important thing is this model’s explainability. Let’s examine the feature importance:

The foremost influential factor (is_op) in forecasting intubation is whether the patient is admitted to the ICU after a major surgery. The second factor (RBC_max) is the maximal concentration of red blood cells in the data window, which is associated with the patient’s oxygenation ability. The third factor (BUN_min) is an indicator of a patient’s renal function, closely tied to the assessment of osmolality. The fourth factor (bf_min) is the slowest breathing rate recorded in the data window, which directly evaluates a patient’s lung function. All of these factors are medically connected to determining a patient’s need for intubation.

Sepsis prediction, left alignment

The second task is to build a model on a more homogenous dataset and try to repeat the success we did on the intubation task. 

Similar conclusion! The XGBoost model is again capable of forecasting sepsis occurrences! But the increment from 48 to 72 hours is not prominent. When compared to the Sepsis-3 criteria, our XGBoost model clearly outperforms in all the metrics measured.

The prediction highlighted systolic pressure (SBP). This leads us to speculate that the model aims to predict organ failure and the consequent hypotension (low systolic blood pressure). The lung function is additionally assessed by oxygenation level (O2Sat_min) and respiratory rate (Resp_min).

Sepsis prediction, sliding window

Finally, we challenged ourselves by transitioning to sliding window prediction. Sliding window, also known as real-time prediction, is of paramount importance in the clinical realm, as it offers instantaneous monitoring and awareness of sepsis occurrences.

 

Unlike the left alignment approach, sliding-window-based prediction is exceedingly challenging due to the highly imbalanced class labels. Consider a patient who was diagnosed with sepsis at the 72nd hour. In other words, when we break down this patient’s data into an hourly-based resolution, there are 71 negative data points and only 1 positive data point (the data points after the occurrence of sepsis are not significant and were not recorded in the dataset). Consequently, the sliding window model must detect subtle time-variant signals from an incredibly imbalanced dataset. In the end, merely 0.18% of the prediction cases were labeled as positive, establishing our random guessing baseline.

 

Our current model demonstrates a time-independent AUC of 0.72, albeit with a less favorable f1-score of 0.008 and an AUPRC of 0.008. At first glance, this might not appear overly promising. However, it represents a significant improvement over using the Sepsis-3 criteria for predictions, where the f1-score is 0.00, AUC is 0.50, and the AUPRC is 0.002 for the same task. When interpreting these results, it’s important to keep in mind that this model is aiming to detect the proverbial “needle in a haystack”—a target occurrence of one in a thousand. From this perspective, our model has performed fairly well. Nevertheless, there remains substantial room for improvement, and I’ll leave this final step for you to explore.

 

Image registration

Introduction

It is common for patients to have more than one image, either from the same or different modality. Clinicians can benefit from obtaining images from multiple modalities or time points to gain a holistic understanding of the patient’s condition. For example, a doctor may require daily X-rays to assess a patient’s pneumonia status. One pivotal aspect of this integration is image registration – a process that aligns multiple images of the same or different modalities into a unified coordinate space.

 

However, the integration of imaging data poses challenges: the images are often acquired in different positions, orientations, and scales, making direct comparisons or joint analyses cumbersome. This is where image registration emerges as a crucial solution. By computationally and automatically aligning images from different modalities or multiple time points, medical professionals can create a unified visualization that enables them to integrate information from various sources. Moreover, a deep learning model trained on these sets of images is likely to achieve higher performance and robustness.

 

Medical image registration is not a trivial process. Due to the intricacies of the human body, a simple affine transformation (translation and linear transform) is incapable of delivering satisfactory results. Moreover, learning targets for the ground truth displacement field, a field (or matrix) that dictates how to shift the pixels from the source image to the target image, are rarely available. This characteristic renders supervised learning impractical. Most significantly, traditional registration methods typically require over 7 minutes to complete the task for only one single image, making these conventional algorithms unlikely to be deployed for practical use.

 

VoxelMorph is a deep-learning-based registration model that directly learns the displacement field from the source and reference images. Unlike traditional methods, VoxelMorph can generate a registration displacement field for a 3D image within 0.1 seconds. At its core, VoxelMorph takes two images (source and reference images) as inputs to its U-Net-like architecture and learns the relationship between the pixels of these two images. The output field can be directly applied to the images and appended with image features (such as segmentation masks or key landmarks). Notably, VoxelMorph is an unsupervised model, making it ideal for handling diverse clinical scenarios.

This article is to test VoxelMorph’s ability on registering MR images and pave the way for our future use.

 

Method

Objective function

Courtesy: From the VoxelMorph paper.

 

The objective function of training a VoxelMorph model is simple. We are attempting to find a function Φ that can be applied to the source image (denoted as “m,” representing the “moving” image), and the resulting images, mΦ, should closely resemble the reference image, denoted as “f” (representing the “fixed” image). Additionally, we apply a universal regularization term to the model weights to prevent overfitting on a single image.

Data collection

We downloaded the data directly from the Learn2Reg Challenge and selected the OASIS dataset for conducting the registration test. This dataset comprises 454 T1-weighted MRI images, and our objective is to register any two randomly chosen images and achieve anatomical alignment between them.

 

Result

The visualized result is presented in the figure below. As evident, one of the major distinctions between the source images (moving images) and the reference images (fixed images) lies in the size of the ventricles – the black chambers within the brains. Upon applying the model to the images, the registered images (moved images) present ventricles of similar sizes to the reference images. Consider the scenario where ventricle volume estimation is crucial for diagnosis (e.g., enlarged ventricles possibly indicating brain fluid injury or blockage, such as CSF reabsorption issues); this registration process could significantly impact the model’s prediction accuracy and overall performance.

Except for the visual results, we also calculated the dice scores between the segmentation masks (important regions in the brains) of the registered images and the reference images. The average dice score among 49 patients in the test set is 0.976, verifying the success of our trained model.

 

Ensure High Availability for Services: Endpoint Monitoring with AWS

HA and Monitoring

Availability is a quality attribute that, in general, refers to the degree to which a system is in an operable state. High availability then describes the set up of infrastructure that prevents a single point of failure and decreases service downtime. In Taiwan AILabs, the Engine team focuses on assuring not only developers but also end-users of a high level of service reliability and operational performance.

 

Monitoring is one of the characteristics that help us achieve high availability. We utilize production monitoring to learn how our services are performing in runtime and reduce the time to detect and mitigate failure, which refers to TTD and TTM.

 

AWS Route53 Health Check

To perform production monitoring and maintain high availability at the same time, we need to ensure there will not be a single point of failure for our monitoring service. That being said, if we run the monitoring service on a single instance and from a fixed location, the failure of the monitored endpoints will be masked if the instance is down. We will never know if the monitoring service and the endpoints are alive as we will not be receiving any alerts.

AWS Route 53 then becomes our go-to option for monitoring external endpoints. Route 53 has its health checkers located in multiple availability zones around the world. When we create a health check on route 53, health checkers across the world will start sending requests to the endpoint, determining if the endpoint is healthy and operational based on response time and number of responses within the failure threshold.

We utilized the “multiple availability zones” characteristic of Route 53 health check to maintain high availability for our monitoring service. Even if one of the health checks in one AZ breaks down, the health checkers from other AZs will still be sending HTTP requests to the endpoint we are monitoring, ensuring high availability for our services. The basic architecture of our monitoring service can be presented as the diagram below.

Figure 1. The architecture of the monitoring service

 

Route 53 health check will be sending HTTP/HTTPS requests to the endpoints that we would like to monitor (e.g., ailabs.tw). We will then connect our health checks to CloudWatch alarms respectively. Whenever Route53 health check switches the state of the endpoint (e.g., HEALTHY to ALARM, or ALARM to HEALTHY), the alarm on CloudWatch will be triggered. In order to receive notifications when the alarm is triggered, we will need to configure an SNS topic with the CloudWatch alarm and subscribe to a lambda function that would route the content of the alarm (in JSON format) to a slack channel.

Figure 2. Health checks on CloudWatch Dashboard

 

IaC

To manage the infrastructure that we deploy on AWS more efficiently and automatically, we adopted the concept of Infrastructure as Code (IaC). Infrastructure as Code is the process of managing infrastructure in a file or files rather than manually configuring resources in a user interface[1]. It helped us to keep track of the resources that we’ve deployed so that we can control the budget and manage our resources more easily.

We developed a Terraform module that allows us to automatically provision 7 resources, including IAM role, health check, CloudWatch alarm, SNS topic, topic subscription, Lambda, and Lambda permission on to AWS in a single command: terraform apply. Developers who would like to monitor an external webpage would only need to specify the domain names, the port, the protocol(HTTP, HTTPS or TCP), the webhook, and the channel name of their slack channel in a configuration file (main.tf) and run `terraform apply`.

Figure 3. Example Usage of the module (written in HCL)

 

In addition to the Terraform module, the Engine team also developed a CI/CD pipeline for the repository that provides resources on AWS. Once the user pushes their configuration file (main.tf) to the repository in GitLab, the CI/CD pipeline will be triggered and start running automated scripts to build and test the configuration. After validating the configuration, users can deploy the resources on AWS without installing Terraform on their local machine since the deploying job will be done by the Docker executor on GitLab and we’ve already installed Terraform on the docker image. The CI/CD practice not only helps create an effortless deployment for engineers but ensures that all the infrastructure deployed to AWS complies with the standards of Terraform and Engine team standards.

 

Reference

[1] Infrastructure as Code with Terraform | Terraform – HashiCorp Learn. (2020). Retrieved 3 August 2020, from https://learn.hashicorp.com/terraform/getting-started/intro?_ga=2.74354834.1030159239.1596423434-594189736.1592893931