MalwareRL

April 2021

MalwareRL is a malware-manipulation environment built on OpenAI Gym, extending the work in Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning (Anderson et al., 2018). I picked it up because the original repo was unmaintained, used Python 2, and was pinned to an outdated version of LIEF. This project modernizes it and integrates additional Malware gym environments and manipulations.

What it does

MalwareRL exposes Gym environments for both Ember and MalConv, letting researchers train RL agents to bypass static malware classifiers. The action space covers non-breaking modifications to PE structure: header tweaks, section additions, import injection, overlay padding, UPX pack/unpack, and so on.

Baseline results (Ember holdout, 250 random samples)

gymagentevasion rateavg ep len
emberRandomAgent89.2%8.2
malconvRandomAgent88.5%16.33

A RandomAgent is provided as a fuzzer-style baseline; the goal of further agents is to minimize the number of modifications while maintaining evasion.

Code: github.com/bfilar/malware_rl