Place all frequencies of nodes into a priority queue, by ascending order
Select two most minimum frequencies (delete min x2), sum these together, and insert a new frequency fw+1.
Use this to build internal nodes for a binary-tree and adds an addition bit to each of the subtrees involved, bottom up. Have leaf nodes contain the data at the bottom.
Stop once priority queue is <1 (the root node is reached).
Self Notes
Node represents both an internal node within the tree, as well as the base nodes for the original characters. I used two different constructors to identify them both.
Analyzing the file had just meant finding the frequency of characters, and then doing some math to discover the total number of bits used. I used the following calcuations to analyize the compression.
// original
bitlength = log_2(#unique_chars)
byte_size_per_char = bit_length * freq
total_byes = sum(byte_size_per_char)
// compressed
count_bits = #of bits to reach parent node from base
byte_size_per_char = count_bits * freq
total_bytes = sum(byte_size_per_char)
Initially I defined the base characters the base of the tree in base_nodes hashmap.
Inserting the node meant adding having two base nodes point to a combined parent node.
With id_char_bit_seq, I decided to not go the extra mile of hashing the user id. This is ok since the number of unique characters tends not to exceed 300 (ASCII).