Abstract:Protocol clustering is a very important step in protocol reverse engineering technology. Aiming at the characteristics of binary protocols that are more transparent and satisfying a wider range of protocols, a binary protocol clustering method based on genetic and protein biological information is proposed, which can learn from the original sequence Angle to cluster protocols directly. The method firstly converts the original binary message into a quaternary gene form, uses the fast clustering method to calculate the k-seed value of the base pairwise combination to generate a distance matrix, and uses UPGMA to calculate the minimum distance spanning tree to obtain the initial cluster; A cluster of quaternary protocol messages is converted into a hexadecimal protein chain, and the sequence is obtained in a more semantic way. The clustering method based on the improved mBed algorithm is used to cluster them with high precision. Tests under pure and mixed scenarios of known and unknown protocols show that this method can achieve efficient and highaccuracy clustering of binary protocols, and has high application value.