کاربرد یادگیری ماشینی مبتنی‌بر شبکۀ عصبی برای دسته‌بندی مستندات علمی مقاله

پردازش و مدیریت اطلاعات تابستان 1401 - شماره 108 رتبه بین المللی (وزارت علوم/ISC (‎28 صفحه - از 1217 تا 1244 )

کلیدواژه ها: علوم انسانی شبکه عصبی دسته‌بندی مستندات علمی ParsBERT معناشناسی توزیعی فضای برداری Vector Space BERT Humanities Neural Network Classification Scientific publications

fa en

چکیده:

از دهه 1380 شمسی، نگارش و انتشار مقالات علمی در ایران شدت بسیار زیادی به خود گرفته‌ و سبب شده‌است علاوه‌بر سازمان‌های دولتی مانند ایرانداک و سازمان اسناد و کتابخانه ملی جمهوری اسلامی ایران، سامانه‌های برخط متعدد دیگری چون پرتال جامع علوم انسانی، نورمگز، مگ‌ایران، علم‌نت، سیویلیکا و غیره اقدام به مدیریت دانش و تهیه بایگانی‌های ساختارمند مستندات علمی کند. هرکدام از این بایگانی‌ها، امکاناتی را در اختیار کاربر قرار می‌دهد. یکی‌از این امکانات، قابلیت جستجو است و جستجوی دقیق می‌تواند بر کاربری این سامانه‌ها تأثیر به‌سزایی بگذارد. برای افزایش دقت جستجو نیاز است حوزه علمی مقالات مشخص شود. دسته‌بندی حجم زیاد منابع علمی در حوزه‌های مختلف بسیار زمانبر است که استفاده از روش‌های ماشینی به‌عنوان یک راه حل می‌تواند از این کار طاقت‌فرسا بکاهد. هدف اصلی این مقاله، ارائه یک مدل دسته‌بندی برای تعیین حوزه مقالات علمی است. اگرچه در پژوهش‌های پیشینِ دسته‌بندی به‌طور عمده از الگوریتم‌های دسته‌بندی متداول برای متن ساده به‌کار رفته‌است، در این پژوهش تلاش می‌شود علاوه‌بر استفاده از این دسته‌بندها، از دسته‌بندهای مبتنی‌بر شبکه عصبی، مانند شبکه عصبی پیچشی[1] و پرسپترون[2]، به‌همراه بازنمایی معنایی مبتنی‌بر بافت، مانند ParsBERT، استفاده گردد و نتایج آن با سایر روش‌های متداول در ساخت بردار مستندات، مانند Word2Vec، مقایسه گردد. برای این هدف، از داده‌های پرتال علوم انسانی که دربرگیرنده مقالات متنوع علوم انسانی استفاده می‌کنیم. ویژگی این داده مشخص‌بودن حوزه تخصصی هر مقاله است. یکی‌از ویژگی‌های شبکه عصبی این است که برایندی از ویژگی‌های نهفته از داده در فضای برداریِ ساخته‌شده شکل می‌گیرد و برای آموزش مدل استفاده می‌گردد. براساس نتایج عملی، دسته‌بند پرسپترون مبتنی‌بر ParsBERT بالاترین کارایی 71/74 درصدی براساس امتیاز F میکرو و کارایی 55/72 درصدی براساس امتیاز F ماکرو را به‌دست آورده‌است. [1] convolutional neural network [2] perceptron neural network

Since 2001s (1380s according to the Iran’s solar calendar), the increasing rate of writing and publishing scientific articles in Iran has become very intense. This caused in addition to the governmental organizations, such as Irandoc & the National Library and Archives of the Islamic Republic of Iran, numerous other online systems, such as the General Portal of Humanities, Noormags, Magiran, Elmnet, Civilica, etc, to manage knowledge and to provide structured archives of the scientific documents. Each of these archives provides facilities to the user. One of these facilities is searching on the documents. An accurate search can greatly improve the usage of these online systems. To increase the accuracy of the search result, it is necessary to determine the scientific field of articles. Classifying large volumes of scientific resources in different fields is very time-consuming. Using machinery methods can be a solution to reduce the severity of the task. The main contribution of this paper is to provide a classification model to classify Persian scientific articles. Although in previous studies, the classification task has been mainly used for simple texts, in this study, the neural network-based classification models, such as convolutional and perceptron neural networks, are used with the contextualized semantic representation, such as ParsBERT; and the results are compared with the other common method utilized for vectorization, namely Word2Vec. To this end, we use the data from the General Portal of Humanities, which includes various articles in the Humanities and each article contains the label of the field. One of the neural network characteristics is that a set of hidden features from the data in the vector space is created and used to train the model. According to the experimental results, the Perceptron classifier that utilized ParsBERT representation obtained the highest performance which is 74.71% based on the Micro F-score, and 72.55% based on the Macro F-score.

دریافت فایل ارجاع :
(پژوهیار, , , )

دانلود HTML
دانلود PDF

ورود / عضویت

برای مشاهده محتوای مقاله لازم است وارد پایگاه شوید. در صورتی که عضو نیستید از قسمت عضویت اقدام فرمایید.

ورود

عضویت

تحتاج دخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.
إن كنت لا تقدر علی شراء الاشتراك عبرPayPal أو بطاقة VISA، الرجاء ارسال رقم هاتفك المحمول إلی مدير الموقع عبر webmaster@noormags.com .

You need Sign in to view the content of the article. If you are not a member, proceed from part Sign up.
If you fail to purchase subscription via PayPal or VISA Card, please send your mobile number to the Website Administrator via webmaster@noormags.com .

لینک کوتاه: