Huffman coding

Huffman coding refers to the process of coding making use of Huffman code. To understand Huffman code, these are ideally prefix codes which are generated with the help of algorithm that has been established by David Huffman. Huffman coding entails the finding of such prefix codes.

Huffman coding is commonly used in compression of lossless data as well as entropy encoding. With this , the output of the algorithm can be visualized as a viable table especially in encoding source symbols. One example of the relevance of these codes is by encoding character in flies.

The algorithm of Huffman used in coding also includes the use of tables based on the possibility of occurrence called as weight for the possible value of the symbol. Huffman coding as a method in encoding entropy, engage the use of more commonly used symbol. When one has the necessary knowledge about this method, it can be a scalable approach in looking for a code in linear time with respect to the input weights when they are sorted. Nonetheless, although this method is efficient in encoding symbols, it does not necessarily mean that it is always optimal compression method.


Huffman coding employs a specific approach in selecting the representation for a symbol. As a result, prefix codes are generated. With its wide application in dealing with prefix codes, it is not a surprise that it is a widespread technique used in the creation of such code. Not only that, it is also likened to a prefix code even if it’s not a product of the algorithm of Huffman.

Basic Huffman’s coding technique

The basic technique used in Huffman coding is through the creation of stuffs referred as binary tree of nodes. In order to make it more adept to various settings, these trees can be set in regular ranges. In connection to this, the size of the binary tree is based on the number of the symbols. To make it distinguishable, nodes can either be internal node or leaf nodes. Originally, all nodes are considered to be leaf nodes. These nodes contain the weight and the symbol itself. In order to make these nodes readable, they are integrated to parent nodes.

On the other hand, internal nodes contain the weight of the symbol. The difference is that it links to a couple of child node in addition to an optimal link to parent nodes. For a rule of a thumb, in a common convention, the left child and right child are represented by bit ‘0’ and bit ‘1’ respectively. Meanwhile, the optimal length of codes is attributed to Huffman trees that ignore symbols that are not used.

To begin the basic Huffman coding, the probability of the occurrence of symbols is contained in the leaf nodes. After that comes the creation of new nodes for 2-node children. Since the previous nodes are no longer considered, instead new ones are recognized, the process is repeated until such time that there is only single node left. This is now called as the Huffman tree.

A priority queue is relevant for simple construction algorithm. With this, there are procedures undertaken for a particular setting where in the node that has the lowest probability is treated with the highest level of priority. As such, one needs to undergo the process of creating node for the symbols. After that, integrate the notes to the priority queue. While more than single node is contained in the queue, the couple of nodes that receive the highest level of priority should be removed. A new internal nodes needs to be created. Lastly, the queue needs to be added to the remaining nodes. The e single node that is left alone is the root node. There you have it- the tree.

Technically, O (log n) time per insertion is essential for the efficiency of the method especially in using the property queue. In addition to that, the queue also commands a tree that comes along with leaves. If the main criterion in sorting the symbols is probability, the essentiality of the linear-time method steps into the scene. This approach includes the creation of Huffman tree by making use of a couple of priority queues

For the linear-time approach, begin by coming up with leaves as many as possible. Next, set all the nodes to the first priority queue. This should be done in the ascending order so that the least will appear in the queue head. As there is one leaf node in the priority quest, the two nodes that have the lowest weight needs to be dequeued. Next, new internal node should be created. Include the new node to the rear of the second priority queue. The node that remains after the process is referred to as the root node.

Huffman coding is indeed a scalable approach in various coding tasks. However, is spite of its essentiality, it is very important that you should be adept with all details in order to come up with a successful coding result. Once the technicalities are mastered, it can be efficiently used for wide varying coding purposes.


The history of Huffman coding is traced back in 1951. In a particular scenario, David Huffman along with his classmates in MIT Information theory were assigned to work in a term paper regarding the best way on how to find the most competent binary code. In spite of the hair-yanking frustrations of Huffman in searching for the best code, he has come up with the idea of making use of a binary tree that is frequency- sorted. With this, he had been able to provide the most efficient code.

From that achievement, Huffman had surpassed his professor. With that, he has collaborated with the information theory expert Claude Shannon with the purpose of developing a similar code. The methodology employed is to build a tree with adherence to the bottom up approach instead of using the top down method. With the help of Huffman, the main loopholes in the Shannon –Fano coding has been avoided.