الخلاصة
Cross-Language Plagiarism Detection (CLPD)is used to automatically identify and extract plagiarism among documents in different languages.The main challenge of cross-languageplagiarism detection is the difference of text languages, where the original source can be analysed and translated, and plagiarism can be detected automatically by comparing suspected text with the original text. This paper proposes an Arabic-English cross-language plagiarism detection method,to automatically detect the semantic relatedness between the words of two suspect targeted files.The proposed method consists of four phases. The first phase is a pre-processing phase,the second involves key phrase extraction and translation, the third phase used plagiarism detection techniques and the fourth phase is the classification process, which using Linear Logistic Regression (LLR). The evaluation process is created using precision and recall measurements of a dataset consisting of Wikipedia articles. The experimental resultsachieved96% precision, 85% recall and 90.16% F-measure. The results show that the LLRalgorithm can be used effectively to detect Arabic-English cross-language plagiarism. |