In the era of the World Wide Web, searching for information is simply done using search engines and online databases. Although this has played a significant role in sharing and disseminating knowledge, it also makes it harder to protect property rights against abusive practices. Smuggling systems or similarity find documents trying to discover these types of abuses. The moonlight system is one of the projects that have been defined in the field of fraud detection in scientific documents at the Shahid Beheshti University Natural Language Processing Laboratory
The MAHTAB project is a similarity system on scientific documents in the field of electrical and computer science. This system compares query documents with a database of twenty thousand papers and thesis in the field of power and computer, and compares database documents based on their similarity to the query document and displays it to the user. To give In addition, the system determines the percentage of the overall similarity of each query document with the source, and can also display the exact location of the similarity between the two documents and determine the percentage of this similarity independently. In this system, document images are also compared and will be effective in determining the overall percentage similarity of the documents. The MAHTAB system is now able to identify the exact types of replicas, copying with changes, and some techniques for manipulating text such as inserting and deleting sentences, dividing and integrating sentences, moving and replacing words with their synonyms. The MAHTAB system is based on data retrieval methods, which has enabled the system to run on massive databases. This system is now able to support Persian and English languages, and the similarity of interlanguage in Persian and English languages is one of the prospects for the MAHTAB system.