Abstract:Concerning the input limitation of pre-training model, a long document needs to be spit into a set of text segments. The performance of long document classification is closely related to the further processing of the segment set and feature fusion. Existing document classification models keep more attention on the sequential of segments in the text segment set. However, the athors consider that the sequential order of segments have a mild influence on drawing the feature of a long document. The authors propose a BERT based long document classification model, which utilizes deep sets to obtain the collectionlevel feature from the segment set. In the model, the authors obtain a set of text segment features after BERT, and this proposed network which is immune to permutation learns the identical feature of the set to represent the long document feature. The accuracy of our model on the 20 newsgroups dataset achieved 90.82%, which outperforms the state-of-the-art method by 4.37%.