Text similarity is a well-known NLP technology used to automatically estimate when different texts are similar. There are basically two ways in which different texts may be deemed to be similar: whether they use the same linguistic expressions (lexical similarity) or whether they talk about the same topics (semantic similarity). Text similarity is usually carried out through statistical methods such as cosine similarity, N-gram similarity, fuzzy matching, latent semantic analysis, etc.
Text similarity is one of the basic NLP technology used in application in legal informatics, in that legal knowledge is usually spread out different documents, which need to be collected and analysed together.
The University of Luxembourg, together with the University of Turin and APIS Europe, has developed a method, called Unifying Similarity Measure (USM), that mediates different text similarity measures for automated identification of national implementations of European union directives (Nanda et al., 2017). Currently, the European Commission resorts to time-consuming and expensive manual conformity checking studies in order to identify and check national transpositions. USM has been conceived to assist legal experts while speeding up the process.