GSI Technology

Application of AI to Cybersecurity—Part I
Introduction to Basic Concepts

Author: Sabrina Ho

Introduction

In this multi-series blog, I will be reviewing the paper — Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection. This paper talks about the application of AI to cybersecurity including malware detection, vulnerability search, and more. Before we get into that, there are some basic concepts we need to understand first:

♦ Control flow graph: What is a control flow graph? And how does it relate to software?

♦ Firmware: It’s a special type of software that you find in lots of places, but why is it important in a cybersecurity context?

♦ Graph embeddings: This is a new technique for taking highly unstructured data (graphs,) and transforming it to a compact vector. We will explore a popular technique to accomplish this.

Control flow graph

What are control flow graphs? If you have ever written any programs, you can represent them as control flow graphs. When I first started learning Python, I had to make a graph for every single program I wrote. Here is a simple program with its control flow graph on the right:

But there are other control flow graphs for other simple programs, here are some additional examples:

**(A) A while loop; (B) A natural loop with two exits; (C) A loop with two entries**

However, in reality, control flow graphs can be super complex, just like the one below. This is a control flow graph for a simple piece of low-level software written in assembly language (for more information, go here):

Because code can be represented as graphs, you might consider using graph matching algorithms to assess code similarity. Code similarity techniques have been used to detect code plagiarism and here we are evaluating its utility in detecting malware or vulnerabilities. Unfortunately, graph matching can be very expensive. Its time complexity and cost can be as bad as O(n³), which is inevitably slow and can sometimes be inaccurate.

Instead, we can convert graphs into embeddings and perform code similarity and code search by just using those embeddings, which is much faster. What kind of code would be useful to turn into embeddings, and why?

Firmware

Firmware is one kind of software that can be found in all kinds of devices — from your cell phone to IoT. Because these devices are cheap and many of them are not going to be secured, they have become the largest target for hackers. The image below shows the example devices that contain firmware:

It would be cool if there was a fast and simple way to determine whether a piece of firmware is similar to malware or to code that is validated as vulnerable. That way we can quickly determine if the software is malicious or vulnerable to hacking. We can do this by translating the code into embeddings via its control flow graph.

Graph embeddings

So, how do we get an embedding from a graph such as control flow graph? We’ve seen graph embeddings in my previous blogs where I turn molecules into vectors. Many kinds of graphs can be turned into graph embedding. It is a powerful technique to understand social networks, molecules, neural networks, and lots of other systems. The image below is an example of a molecule converted into a vector form:

Graph embeddings can help us quickly assess the similarity of graphs without the cost of graph matching. Here is another example of a graph—a social network:

**A graph containing thousands of connected nodes**

The previous two images shown above are graphs. Yet, the social network graph is a lot more complicated than the molecule graph.

There are many techniques to convert a graph into embeddings, and DeepWalk is one of them. The image below shows the phases of DeepWalk approach:

First, we take random walks on the graph to produce fixed-length sample paths. If we take enough random walks, we can accumulate a data set that resembles the patterns in the graph. Then, we can use this as the training data for machine learning, which learns the pattern and encodes them into embeddings such as vectors. (For more information on DeepWalk, go here)

Conclusion

Now that we have a good understanding of basic graph concepts, we can dive deeper into the paper that applies artificial intelligence and similarity search to malware and vulnerability detection in the next blog of the series.

High Performance, High Reliability, Low Power

The cornerstone of similarity search

Data to Knowledge to Insight to Action

Technology that benefits human health

Safety and Security

Hosted services for GSI APU applications

High performance compute-in-memory

Bringing real-time datacenter SAR processing to mobile applications

Next Gen Navigation

Can’t hide from the APU

Personalized, relevant search through natural language

Efficiently processing big data for A&D applications

Bringing real-time datacenter SAR processing to mobile applications

APU finds the needle in the haystack

Next Gen Navigation

Advanced threat detection

From stringent space-grade memory to AI acceleration

From stringent space-grade memory to AI acceleration

The highest performance, highest density monolithic SRAMs

From stringent space-grade memory to AI acceleration

From stringent space-grade memory to AI acceleration

Visionary thought leadership

AI’s Energy Dilemma

Create, maintain, debug

Application of AI to Cybersecurity—Part I
Introduction to Basic Concepts

Introduction

Control flow graph

Firmware

Graph embeddings

Conclusion

Company Info

AI & Compute

Aerospace & Defense

Memory

Quick Links

FOLLOW US

High Performance, High Reliability, Low Power

The cornerstone of similarity search

Data to Knowledge to Insight to Action

Technology that benefits human health

Safety and Security

Hosted services for GSI APU applications

High performance compute-in-memory

Bringing real-time datacenter SAR processing to mobile applications

Next Gen Navigation

Can’t hide from the APU

Personalized, relevant search through natural language

Efficiently processing big data for A&D applications

Bringing real-time datacenter SAR processing to mobile applications

APU finds the needle in the haystack

Next Gen Navigation

Advanced threat detection

From stringent space-grade memory to AI acceleration

From stringent space-grade memory to AI acceleration

The highest performance, highest density monolithic SRAMs

From stringent space-grade memory to AI acceleration

From stringent space-grade memory to AI acceleration

Visionary thought leadership

AI’s Energy Dilemma

Create, maintain, debug

Application of AI to Cybersecurity—Part I Introduction to Basic Concepts

Introduction

Control flow graph

Firmware

Graph embeddings

Conclusion

Company Info

AI & Compute

Aerospace & Defense

Memory

Quick Links

FOLLOW US

Application of AI to Cybersecurity—Part I
Introduction to Basic Concepts