Hasil Pencarian

Ditemukan 130592 dokumen yang sesuai dengan query

Karin Marshanda

Penerapan Adaptive Synthetic Sampling Approach dalam Menangani Ketidakseimbangan Kelas pada Dataset Wi-Fi Attacks = Application of Adaptive Synthetic Sampling Approach in Handling Class Imbalance in Wi-Fi Attacks Dataset

"Instrusion Detection System (IDS) merupakan sistem untuk mendeteksi serangan dalam jaringan, baik lokal maupun internet. Dalam melakukan deteksi penyalahgunaan atau deteksi anomali, beberapa peneliti telah menggunakan data mining untuk mengidentifikasi berbagai jenis intrusi, termasuk yang jarang terjadi. Namun, data mining rentan terhadap data imbalance (data tidak seimbang) yang dapat mengurangi efektivitas algoritma klasifikasi karena asumsi mayoritas classifier terhadap distribusi yang seimbang. Berdasarkan permasalahan tersebut, maka akan dilakukan penelitian terkait penanganan data imbalance menggunakan metode Adaptive Synthetic Sampling (ADASYN) dengan cara menghasilkan data sintetis pada kelas minoritas agar algoritma klasifikasi dapat bekerja lebih baik. Metode ADASYN efektif bekerja pada variabel prediksi berjumlah 2 kelas (binary class), namun dikarenakan penelitian ini berurusan dengan masalah multiclass, makan akan digunakan pendekatan One-Vs-One (OVO) untuk menyeimbangkan kelas. Keefektifan ADASYN akan dievaluasi melalui implementasinya pada dataset Wi-Fi attacks, yaitu Aegean Wi-Fi Intrusion Dataset (AWID2). Data sebelum dan setelah rebalancing dievaluasi dengan menggunakan metode klasifikasi seperti regresi logistik dan Support Vector Machine (SVM), untuk dibandingkan nilai precision, recall, spesifisitas, serta F1-score dari kedua dataset tersebut. Meskipun ADASYN hanya meningkatkan nilai precision dalam dataset Wi-Fi attacks, dengan menggunakan metode klasifikasi SVM kernel polynomial terbukti efektif dalam mendeteksi kelas serangan, meskipun performa metrik lainnya tidak mencapai tingkat yang sama.

An Intrusion Detection System (IDS) is a system designed to detect attacks within networks, both local and internet-based. In the realm of misuse detection or anomaly detection, researchers have utilized data mining to identify various types of intrusions, including those that occur infrequently. However, data mining is susceptible to data imbalance, which can reduce the effectiveness of classification algorithms due to their assumption of balanced distribution. To address this issue, research will focus on handling data imbalance using the Adaptive Synthetic Sampling (ADASYN) method, which generates synthetic data for the minority class to enhance the performance of classification algorithms. ADASYN is effective for predictive variables with binary class scenarios, but since this study deals with multiclass problems, an One-Vs-One (OVO) approach will be employed to balance the classes. The effectiveness of ADASYN will be evaluated by implementing it on the Wi-Fi attacks dataset, specifically the Aegean Wi-Fi Intrusion Dataset (AWID2). Data before and after rebalancing will be evaluated using classification methods such as logistic regression and Support Vector Machine (SVM). Metrics including precision, recall, specificity, and F1-score will be compared between the two datasets. Although ADASYN only improves precision values in the Wi-Fi attacks dataset, using SVM with a polynomial kernel has proven effective in detecting attack classes, although other metric performances did not reach the same level."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

14-24-64198984

UI - Skripsi Membership Universitas Indonesia Library

Rafa Elmira Afiani

Implementasi Metode Whale Optimization Algorithm-Support Vector Machine dalam Klasifikasi Serangan Siber pada Jaringan Internet of Things = Implementation of Whale Optimization Algorithm-Support Vector Machine Method in Cyber Attack Classification on Internet of Things Networks

"Internet of Things (IoT) merupakan sebuah teknologi yang memungkinkan perangkat untuk berkomunikasi dan mengirimkan data melalui jaringan tanpa campur tangan manusia. Kompleksitas pada jaringan IoT menyebabkan sistem mengalami kesulitan dalam mendeteksi properti serangan dan memaksa sistem untuk memperkuat keamanannya. Salah satu upaya yang paling sering digunakan untuk pertahanan jaringan IoT adalah Intrusion Detection System (IDS). Penggunaan IDS dapat memberikan peringatan dini dan mampu melakukan pencegahan terhadap potensi serangan pada jaringan. Penelitian ini menggunakan dataset Aegean WIFI Intrusion Dataset (AWID2) yang berisikan lalu lintas trafik internet pada jaringan WIFI. Data AWID2 berisi 2,3 juta records dan dikelompokkan ke dalam empat kelas yaitu normal, impersonation, injection, dan flooding. Penelitian ini dilakukan untuk melakukan klasifikasi jenis serangan siber pada jaringan IoT melalui penerapan teknik machine learning dengan metode Whale Optimization Algorithm – Support Vector Machine (WOA-SVM) dengan kernel RBF dan pendekatan One vs Rest, dimana Whale Optimization Algorithm (WOA) digunakan sebagai optimasi parameter yang digunakan pada metode Support Vector Machine (SVM). Untuk mengatasi permasalahan dimensi data yang tinggi pada dataset yang digunakan, dilakukan seleksi fitur untuk reduksi dimensi data dengan menggunakan metode seleksi fitur filter Information Gain. Kinerja model dievaluasi berdasarkan nilai metrik accuracy, precision, recall, dan F1 Score dengan memperhatikan waktu klasifikasi dan proprosi train-test split berkisar dari 50%-90%. Hasil penelitian menunjukkan bahwa model WOA-SVM memperoleh kinerja terbaik dengan menggunakan 40 fitur terbaik dari hasil seleksi fitur Information Gain menghasilkan tingkat accuracy sebesar 99,5951%, precision sebesar 96,3928%, recall sebesar 99,8888%, F1 Score sebesar 98,0662%, dan waktu klasifikasi selama 16,831 detik. Hasil kinerja model WOA-SVM tersebut lebih baik jika dibandingkan dengan tanpa menggunakan seleksi fitur dan SVM tanpa optimasi parameter WOA.

The Internet of Things (IoT) is a technology that enables devices to communicate and transmit data over a network without human intervention. The complexity of IoT networks poses challenges in detecting attack properties and necessitates enhanced security measures. One of the most commonly employed defenses for IoT networks is the Intrusion Detection System (IDS). The use of IDS provides early warnings and can prevent potential attacks on the network. This study utilizes the Aegean Wi-Fi Intrusion Dataset (AWID2), which contains internet traffic data on Wi-Fi networks. The AWID2 dataset comprises 2 million records categorized into four classes: normal, impersonation, injection, and flooding. This research aims to classify types of cyber-attacks on IoT networks by applying machine learning techniques using the Whale Optimization Algorithm - Support Vector Machine (WOA-SVM) method with an RBF kernel and a One vs. Rest approach. The Whale Optimization Algorithm (WOA) is used to optimize the parameters employed in the Support Vector Machine (SVM) method. To address the high-dimensional data issue in the dataset, feature selection is performed to reduce data dimensions using the Information Gain filter method. The model's performance is evaluated based on the metrics of accuracy, precision, recall, and F1 Score, considering computation time and train-test split proportions ranging from 50% to 90%. The results indicate that the WOA-SVM model achieves the best performance by using the top 40 features from the Information Gain feature selection, yielding an accuracy of 99.5951%, precision of 96.3928%, recall of 99.8888%, F1 Score of 98.0662%, and a computation time of 16.831 seconds. The performance of the WOA-SVM model is superior compared to models without feature selection and SVM without WOA parameter optimization."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Syahrul Amrie

Analisis sentimen terhadap layanan imigrasi menggunakan data Twitter, Instagram dan ulasan pada aplikasi M-Paspor di Google play store berbasis pembelajaran mesin = Sentiment analysis on immigration services using data Twitter, Instagram and review application M-paspor on Google play store based on machine learning

"Perkembangan media sosial telah berkembang pesat, tidak hanya sebagai alat komunikasi sosial antar individu. Fungsi dan kegunaannya semakin berkembang serta banyak dimanfaatkan organisasi swasta maupun pemerintah untuk mengukur tingkat layanan. Ditjen Imigrasi selaku organisasi pemerintah merupakan salah satu organisasi yang memanfaatkan media sosial, salah satu fungsinya untuk mengetahui apakah layanan yang diberikan telah diterima dengan baik oleh masyarakat. Selain melalui media sosial, Imigrasi juga telah meluncurkan aplikasi M-Paspor di platform Google Play Store, di platform tersebut Imigrasi juga dapat mengetahui tingkat efektivitas dari aplikasi yang telah diluncurkan. Berdasarkan survei yang dilakukan oleh Balitbangham yang merupakan internal dari Kemenkumham, layanan yang diberikan oleh imigrasi mendapat nilai sangat baik, namun faktanya pada media sosial maupun google play store banyak komentar maupun ulasan yang kurang puas dengan pelayanan pihak imigrasi. Hal tersebut menjadi kontradiksi antara hasil survei Balitbangham dan data di media sosial. Namun, akan sulit untuk melakukan analisis data media sosial dikarenakan jumlah yang banyak. Oleh karena itu, perlu dilakukan untuk mengusulkan sistem untuk melakukan analisis sentimen menggunakan data teks komentar dan ulasan. Sehingga pihak Imigrasi dapat mengambil langkah terbaik untuk dapat memperbaiki layanan yang masih belum maksimal. Dataset yang digunakan berupa data yang diambil dari media sosial Twitter dan Instagram serta ulasan pada Google Play Store. Hasil penelitian menunjukan jika fitur ekstraksi TF-IDF Unigram yang dipadukan dengan algoritma Support Vector Machine (SVM) serta SMOTE menghasilkan performa paling tinggi dibandingkan dengan nave Bayes (NB) maupun Random Forest (RF). dalam melakukan klasifikasi, SVM menghasilkan dengan hasil Precision 72%, Recall 69%, Accurasy 69, serta F1-Score sebesar 68%. Model tersebut dapat digunakan Imigrasi untuk mengetahui umpan balik pelayanan dari masyarakat yang dapat digunakan sebagai pertimbangan dalam melakukan perbaikan pelayanan serta merumuskan strategi pelayanan oleh Direktorat terkait agar pelayanan lebih efisien untuk kedepannya. Sehingga, Imigrasi akan mampu dengan cepat merespon kendala yang dihadapai oleh masyarakat.

The development of social media has grown rapidly, not only as a means of social communication between individuals. Its functions and uses are growing and are widely used by private and government organizations to measure service levels. The Directorate General of Immigration as a government organization is one of the organizations that utilizes social media. Its function is to find out whether the services provided have been well received or not by the public. Apart from social media, Immigration has also launched the M-Passport application on the Google Play Store platform, on the platform, Immigration officials can also find out the effectiveness of the applications that have been launched. Based on a survey conducted by Balitbangham which is internal to the Ministry of Human Rights, the services provided by immigration get a very good score, but the fact is that on social media and the Google Play Store some many comments and reviews are not satisfied with the services of the immigration authorities. This is a contradiction between the results of the Balitbangham survey and data on social media. However, it will be difficult to analyze social media data due to the large number. Therefore, it is necessary to propose a system to perform sentiment analysis using commentary and reviewing text data. So that Immigration can take the best steps to be able to improve services that are still not optimal. The dataset used is in the form of data taken from social media Twitter and Instagram as well as reviews on the Google Play Store. The results show that the TF-IDF Unigram extract feature combined with the Support Vector Machine (SVM) and SMOTE algorithms produces the highest performance compared to NaÃ¯ve Bayes (NB) and Random Forest (RF). In classifying, SVM produces 72% Precision, 69% Recall, 69% Accuracy, and 68% F1-Score. This model can be used by Immigration to find out service feedback from the community as a consideration in making service improvements and formulating more efficient service strategies for the future. Thus, Immigration will be able to quickly respond to the obstacles faced by the community."

Jakarta: Fakultas Ilmu Kompter Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Ashma Hanifah Shalihah

Evaluasi Kualitas Layanan Aplikasi melalui Text Mining Ulasan Pengguna: Studi Kasus Aplikasi Dompet Digital LinkAja = Evaluation of Application Service Quality using Text Mining from User Reviews: A Case Study of the LinkAja Digital Wallet Application

"Penelitian ini mengevaluasi kualitas layanan aplikasi dompet digital LinkAja melalui analisis ulasan online pengguna menggunakan pendekatan text mining. Proses penelitian meliputi ekstraksi ulasan, pra-pemrosesan data, analisis sentimen, deteksi keluhan dengan topic modelling, pengukuran skor kualitas layanan, dan evaluasi total skor. Sebanyak 50.626 data ulasan diekstraksi dari Google Play Store dan Apple App Store. Pengukuran skor kualitas layanan dilakukan dengan mengidentifikasi dimensi kualitas layanan berdasarkan literatur terkait, serta mendefinisikan kata kunci representatif yang divalidasi melalui metode Delphi. Hasil penelitian menunjukkan bahwa model Support Vector Machine (SVM) menunjukkan performa terbaik dengan akurasi 89,30%, diikuti oleh Random Forest dengan akurasi 85,08%, dan Naive Bayes dengan akurasi 73,91%. Ulasan pengguna didominasi oleh sentimen negatif, dengan topik-topik keluhan utama berkaitan dengan persepsi kemudahan pengguna, layanan pelanggan, dan kecepatan transaksi. Selain itu, dari pengukuran skor kualitas layanan, ditemukan bahwa faktor kunci yang berpengaruh signifikan terhadap persepsi kualitas layanan pengguna dalam konteks dompet digital adalah keandalan, kualitas informasi, dan responsivitas. Penelitian ini memiliki keterbatasan dalam mendeteksi atau mengklasifikasikan sentimen dari buzzer, yang sering memposting konten berulang untuk mempengaruhi opini publik. Ketidaktercakupan ini dapat menyebabkan bias dalam hasil sentimen. Adapun rekomendasi untuk meningkatkan kualitas layanan meliputi perbaikan stabilitas sistem, peningkatan informasi status transaksi, perbaikan layanan pelanggan, penambahan fitur reset nomor akun, dan optimasi kecepatan transaksi. Implementasi rekomendasi ini diharapkan dapat meningkatkan kepuasan dan loyalitas pengguna terhadap aplikasi LinkAja.

This research evaluates the service quality of the LinkAja digital wallet application through the analysis of online user reviews using a text mining approach. The research process includes review extraction, data pre-processing, sentiment analysis, complaint detection with topic modelling, service quality score measurement, and total score evaluation. A total of 50,626 review data were extracted from Google Play Store and Apple App Store. Service quality score measurement was conducted by identifying service quality dimensions based on related literature, as well as defining representative keywords that were validated through the Delphi method. The results showed that the Support Vector Machine (SVM) model performed best with 89.30% accuracy, followed by Random Forest with 85.08% accuracy, and Naive Bayes with 73.91% accuracy. User reviews were dominated by negative sentiments, with the main complaint topics related to perception of ease of use, customer service, and transaction speed. In addition, from the measurement of service quality scores, it was found that the key factors significantly influencing user perception of service quality in the context of digital wallets are reliability, information quality, and responsiveness. This research has limitations in detecting or classifying the sentiments of buzzers, who often post repetitive content to influence public opinion. This lack of coverage may cause bias in the sentiment results. Meanwhile, the recommendations for improving service quality include improving system stability, improving transaction status information, improving customer service, adding account number reset features, and optimising transaction speed. The implementation of these recommendations is expected to increase user satisfaction and loyalty to the LinkAja application."

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2024

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Rosyda Hanavania

Metode Filter untuk Seleksi Fitur pada Klasifikasi WIFI Attacks = Filter Method for Feature Selection on WIFI Attacks Classification

"Curse of dimensionality atau kutukan dimensi merupakan permasalahan nyata terkait dengan dimensi tinggi pada data. Fenomena ini menyebabkan model bekerja secara tidak optimal, terjadinya overfitting, dan sulitnya proses komputasi data. Kasus data dengan dimensi tinggi ini banyak ditemukan pada data IoT (Internet of Things). Kompleksitas pada ekosistem IoT tersebut membuat sistem mengalami kesulitan dalam penangkapan properti serangan dan memaksa sistem untuk memperkuat keamanannya. Salah satu upaya yang paling banyak digunakan untuk pertahanan sistem IoT adalah dengan Intrusion Detection System (IDS). Penelitian ini menggunakan dataset Aegean WIFI Intrusion Dataset (AWID2) yang berisikan lalu lintas trafik internet pada jaringan WIFI. Data AWID2 berisi 2 juta records dan dikelompokkan ke dalam empat kelas yaitu normal, impersonation, injection, dan flooding. Untuk menyelesaikan permasalahan dimensi tinggi pada data ini, dilakukan teknik reduksi dimensi yaitu seleksi fitur jenis filter. Metode filter yang digunakan yaitu, Correlation based Feature Selection (CFS), Information Gain (IG), dan ANOVA F-test. Setiap metode seleksi fitur tersebut dilanjutkan dengan metode multiclass Support Vector Machines (SVM) one vs rest dan one vs one. Hasil dari penelitian ini menunjukkan bahwa metode fitur seleksi ANOVA F-test dengan metode klasifikasi SVM kernel polynomial dengan menggunakan 7 fitur terbaik merupakan metode paling baik untuk digunakan pada klasifikasi WIFI attacks data AWID2. Hal tersebut ditunjukkan melalui nilai accuracy=0,9766, F1score=0,8385, precision=0,9854, dan recall=0,7708.

Curse of dimensionality is a problem related to high dimensions of data. This phenomenon can cause the non-optimal performance model, overfitting, and the data will be computationally expensive. This high dimensional data is mostly found in IoT (Internet of Things) data. The complexity of the IoT ecosystem makes it difficult for the system to capture potential attacks and forces the system to strengthen its security. One of the most widely used efforts to defend IoT systems is the Intrusion Detection System (IDS). This research will use the Aegean WIFI Intrusion Dataset (AWID2) which contains internet traffic on WIFI networks. AWID2 dataset contains of 2 million records and are grouped into four classes, namely normal, impersonation, injection, and flooding. To overcome the problem of high dimensions, this study used dimensional reduction techniques, namely feature selection filter method. The filter methods used are Correlation based Feature Selection (CFS) Information Gain (IG), and ANOVA F-test. Each of these feature selection methods is then followed by building a classification model using multiclass Support Vector Machines (SVM) one vs one and one vs rest method. This study tells that combination of feature selection ANOVA F-test method and SVM with polynomial kernel is the best method to use on WIFI attacks classification. It is indicated by the score of performance metrics namely, accuracy=0,9766, F1score=0,8385, precision=0,9854, and recall=0,7708. "

Depok: Fakultas Matematika Dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Hanandi Rahmad Syahputra

Penerapan Discriminant Analysis dan Support Vector Machine dalam Memprediksi Tren Pergerakan Harga Saham di Bursa Efek Indonesia = The Implementation of Discriminant Analysis and Support Vector Machine in Predicting The Trend of Stock Price Movements on the Indonesia Stock Exchange.

"Memprediksi pergerakan harga saham merupakan tugas yang sangat menantang karena karakteristik pasar saham yang kompleks, tidak linier, dan penuh ketidakpastian. Namun berdasarkan pada teori efficient market hypothesis dan tingkat efisiensinya, memprediksi pergerakan harga saham merupakan tugas yang masih memungkinkan untuk dicapai. Banyak pendekatan telah diterapkan untuk memprediksi pergerakan harga saham mulai dari pendekatan statistik linier sederhana seperti discriminant analysis (DA) hingga pendekatan machine learning yang kompleks seperti support vector machine (SVM). Baik DA dan SVM adalah pendekatan yang dapat digunakan untuk melakukan klasifikasi seperti memprediksi tren harga saham dari beberapa kelas. Dalam penelitian ini, tren pergerakan harga saham diklasifikasikan ke dalam dua kelas, yaitu "highly possible to go up" dan "highly possible to go down or be neutral" di mana pemisahan kelasnya didasarkan pada variabel berupa data teknikal, fundamental, keuangan, dan koefisien beta dari saham di Bursa Efek Indonesia (BEI). Dengan menggunakan variabel-variabel ini, sejumlah model prediksi dengan periode prediksi atau fungsi tertentu dilatih dan kemudian digunakan untuk memprediksi tren pergerakan harga saham di BEI. Periode prediksi yang digunakan dalam penelitian ini berkisar dari 1 bulan hingga 9 bulan. Metode stepwise linear regression (SLR) dan sequential forward selection (SFS) diterapkan sebagai metode feature selection guna memilih variabel yang paling relevan sehingga kinerja setiap model prediksi dapat dioptimalkan. Pada penelitian ini, jumlah fitur, nilai signifikansi maksimum dari F-to-enter, fungsi kernel, dan metode parameter selection divariasikan sehingga dihasilkan 12 model prediksi DA dan 30 model prediksi SVM. Dengan menerapkan beberapa proses evaluasi, maka model prediksi dengan tingkat akurasi dan efektifitas yang paling baik dapat dipilih. Dari seluruh 12 model prediksi DA yang dirancang, terdapat 3 model prediksi yang dinilai layak untuk diterapkan. Sedangkan dari seluruh 30 model prediksi SVM yang dirancang, terdapat 11 model prediksi yang dinilai layak untuk diterapkan. Kemudian dari 14 model prediksi yang dinilai layak tersebut, 4 model prediksi terbaik untuk periode prediksi 3, 5, 7, dan 9 bulan serta 1 model prediksi terbaik dengan fungsi untuk mengklasifikasi major trend selama 9 bulan telah berhasil dipilih. Kelima model tersebut merupakan model prediksi SVM sehingga dapat disimpulkan bahwa SVM mengungguli DA dalam memprediksi tren pergerakan harga saham di Bursa Efek Indonesia.

Predicting the movement of stock prices is a very challenging task because the characteristics of the stock market are complex, non-linear, and full of uncertainty. However, based on the efficient market hypothesis theory and its level of efficiency, predicting stock price movements is a task that is still possible to achieve. Many approaches have been applied for predicting the movement of stock prices ranging from simple linear statistical approaches such as discriminant analysis (DA) to complex machine learning approaches such as support vector machines (SVM). Both DA and SVM are approaches that can be used to perform classifications such as predicting stock price trends from several classes. In this study, the trends of stock price movements are classified into two classes, namely "highly possible to go up" and "highly possible to go down or be neutral" in which the class separation is based on variables in the form of technical, fundamental, financial, and beta coefficient data of stocks on the Indonesia Stock Exchange (IDX). By using these variables, a number of prediction models with specific prediction periods or functions are trained and then used to predict the trends of stock price movements on the IDX. The prediction periods used in this study range from 1 month to 9 months. The stepwise linear regression (SLR) and sequential forward selection (SFS) methods are applied as the feature selection methods to select the most relevant variables so that the performance of each prediction model can be optimized. In this study, the number of features, the maximum significance value of the F-to-enter, kernel function, and parameter selection method are varied to produce 12 DA prediction models and 30 SVM prediction models. By applying several evaluation processes, the prediction model with the best level of accuracy and effectiveness can be chosen. From all 12 DA prediction models designed, there are 3 prediction models that are considered feasible to be applied. While from all 30 SVM prediction models designed, there are 11 prediction models that are considered feasible to be applied. Then, out of these 14 prediction models that are considered feasible, 4 best prediction models for the prediction periods of 3, 5, 7, and 9 months and 1 best prediction model with the function to classify the major trend for 9 months have been successfully selected. These five prediction models are SVM prediction models so that it can be concluded that SVM outperforms DA in predicting the trends of stock price movements on the Indonesia Stock Exchange."

Depok: Fakultas Ekonomi dan Bisnis Universitas Indonesia, 2020

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Kesia Gabriele

Komparasi Metode SMOTE, SMOTE-ENN, dan SMOTE-CUT dalam Menangani Imbalanced Data pada Klasifikasi Multi-Kelas dengan Support Vector Machine (SVM) = Comparative Analysis of SMOTE, SMOTE-ENN, and SMOTE-CUT in Multi-Class SVM Classification for Imbalanced Data

"Support Vector Machine (SVM) merupakan model klasifikasi yang dikenal dengan keakuratan klasifikasi yang tinggi. Namun, Support Vector Machine (SVM) menghasilkan hasil klasifikasi yang kurang optimal jika data yang digunakan tidak seimbang (imbalanced data). Terdapat beberapa cara dalam menangani data yang tidak seimbang, salah satunya dengan metode resampling. Metode resampling sendiri terbagi dalam dua pendekatan yaitu over-sampling dan under-sampling. Salah satu pendekatan over-sampling yang popular adalah Synthetic Minority Over-sampling Technique (SMOTE). SMOTE bekerja dengan membangkitkan sampel sintetis pada kelas minoritas. Untuk meningkatkan kinerja model, SMOTE dapat digabungkan dengan pendekatan under-sampling seperti Edited Nearest Neighbors (ENN) dan Cluster-based Undersampling Technique (CUT). Dalam kombinasinya dengan SMOTE, ENN berperan sebagai cleaning untuk menghapus data sintetis dari penerapan SMOTE yang tidak relevan dan dianggap sebagai noise. Sementara, CUT beperan dalam mengidentifikasi sub-kelas dari kelas mayoritas untuk menekan angka over-sampling sekaligus meminimalisir hilangnya informasi penting pada kelas mayoritas selama proses undersampling. Kombinasi over-sampling dan under-sampling ini saling melengkapi dan mengatasi kekurangan dari masing-masing metode. Penelitian ini memfokuskan perbandingan performa metode resampling SMOTE beserta variasinya, yaitu SMOTEENN dan SMOTE-CUT dalam mengklasifikasikan data multi-kelas yang tidak seimbang menggunakan Support Vector Machine. Dari analisis yang dilakukan, diperoleh kesimpulan bahwa SMOTE-CUT cenderung menghasilkan performa klasifikasi yang lebih baik dibandingkan dengan SMOTE ataupun SMOTE-ENN. Walaupun demikian, keseluruhan metode resampling (SMOTE, SMOTE-ENN, dan SMOTE-CUT) mampu meningkatkan kinerja dari model klasifikasi Support Vector Machine (SVM).

Support Vector Machine (SVM) is popular classfier that is known for its high accuracy value. However, Support Vector Machine (SVM) may not perform well on imbalanced datasets. There are several ways to handle imbalanced data, one of them is through resampling methods. Resampling methods itself divided into two approaches, oversampling and under-sampling. One of the popular over-sampling methods is Synthetic Minority Over-sampling Technique (SMOTE). SMOTE works by generating synthetic samples for the minority class. SMOTE can be combined with under-sampling methods such as Edited Nearest Neighbors (ENN) or Cluster-based Under-sampling Technique (CUT). In combination with SMOTE, ENN acts as a cleaning role to remove synthetic data generated from SMOTE application that is not relevant and considered as noise. Meanwhile, CUT plays a role in identifying sub-class form the majority class to reduce over-sampling while minimizing the loss of important information in the majority class during the under-sampling process. The combination of over-sampling and undersampling is needed to complement and overcome the weakness of each method. This research mainly focuses on comparing the performance of the resampling method SMOTE and its variations, SMOTE-ENN and SMOTE-CUT, in classifying multi-class imbalanced data using Support Vector Machine. From the analysis conducted, it was concluded that data with resampling SMOTE-CUT shows better classification performance compare to data with resampling SMOTE or SMOTE-ENN. However, any resampling method (SMOTE, SMOTE-ENN, and SMOTE-CUT) can handle imbalanced data and improve Support Vector Machine performance."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Nurul Qomariah Abdillah

Analisis Perbandingan Kinerja Metode Seleksi Fitur Information Gain Ratio dan Chi-Square dalam Klasifikasi Serangan Siber pada Jaringan Wi-Fi = Comparative Performance Analysis Of Information Gain Ratio and Chi Square Feature Selection Methods in Cyber Attack Classification On Wi-Fi Networks

"Perkembangan teknologi informasi dan komunikasi saat ini menciptakan ketergantungan manusia terhadap teknologi dan internet, salah satunya melalui penggunaan jaringan Wi-Fi. Konektivitas Wi-Fi berkaitan erat dengan Internet of Things (IoT) karena dapat memfasilitasi perangkat IoT untuk saling terhubung dan terkoneksi ke jaringan internet. Namun, peningkatan penggunaan Wi-Fi publik maupun privat rentan terhadap serangan siber. Badan Sandi dan Siber Negara memperkirakan tahun 2024 akan muncul ancaman seperti IoT attacks, distributed denial of services (DDOS), phishing, dan lainnya. Oleh karena itu, perlu adanya upaya antisipatif untuk mengatasi serangan siber. Salah satu upayanya adalah menerapkan intrusion detection system (IDS) untuk memantau lalu lintas jaringan dan memberikan peringatan jika terdapat serangan. Peningkatan kemampuan deteksi IDS dapat dilakukan dengan menerapkan metode machine learning yang mampu mempelajari pola serangan secara efektif dan akurat. Pada penelitian skripsi ini diterapkan metode klasifikasi Support Vector Machine (SVM) Multiclass dengan pendekatan one-vs-one dan one-vs-rest pada dataset Aegean Wi-Fi Intrusion Detection System (AWID2) yang terdiri dari empat kelas dan memiliki dimensi data yang tinggi, yaitu 154 dimensi (fitur). Dalam mengatasi masalah dimensi tinggi tersebut dilakukan seleksi fitur yang bertujuan untuk menghilangkan fitur yang tidak relevan, sehingga fitur hanya terkonsentrasi pada fitur- fitur yang relevan dan informatif dalam menggambarkan serangan. Penelitian skripsi ini menggunakan metode Chi-square dan Information Gain Ratio. Hasil penelitian skripsi ini menunjukkan metode seleksi fitur Chi-square dengan klasifikasi SVM One Vs Rest pada kernel polynomial dengan memilih 54 fitur tertinggi merupakan model terbaik dalam mengklasifikasikan serangan siber pada Wi-Fi dengan nilai accuracy = 98,03%, Precision = 87,24%, Recall = 99,30%, dan F1 score = 91,90%.

Today's advances in information and communication technology create human dependence on technology and the Internet, one of which is through the use of Wi-Fi networks. Wi-fi connectivity is closely related to the Internet of Things (IoT) because it can facilitate IoT devices to interconnect and be connected to the internet network. However, increased use of public and private Wi-FI is vulnerable to cyber attacks. The National Password and Cyber Agency predicts that threats such as IoT attacks, Distributed Denial of Services, phishing, and more will emerge in 2024. Therefore, there is a need for pre-emptive efforts to deal with cyberattacks. One attempt is to implement the Intrusion Detection System (IDS) to monitor network traffic and give warning if there is an attack. Improved IDS detection capabilities can be achieved by applying machine learning methods that can learn patterns of attack effectively and accurately. In this study, the multi-class Support Vector Machine (SVM) classification method was applied to the Aegean Wi-Fi Intrusion Detection System (AWID2) dataset, which consists of four classes and has a high data dimension, namely 154 dimensions. In addressing the high dimension problem, a feature selection was carried out aimed at eliminating irrelevant features, so that the features were concentrated only on the features that are relevant and informative in describing the attack. This study of the script uses the Chi-square method and Information Gain Ratio. The results of this study show that the method of selection of the feature Chi-square with SVM One vs Rest classification on the polynomial kernel by choosing the 54 highest features is the best model in classifying cyber attacks on Wi-Fi with accuracy values = 98.03%, Precision = 87.24%, Recall = 99.30%, and F1 score = 91.90%."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Alya Nadifa Putri

Penerapan Support Vector Machines dan Perbandingan Metode Risk Parity, Minimum Variance, dan Equal-Weight dalam Pemilihan Portofolio ETF di Indonesia = Application of Support Vector Machines and Comparison of Risk Parity, Minimum Variance, and Equal-Weight Methods for ETF Portfolio Selection in Indonesia

"Exchange Traded Funds (ETF) adalah salah satu produk investasi pasar modal yang berupa reksa dana dan diperjualbelikan secara real time layaknya saham. ETF dapat menjadi pilihan investasi yang cocok untuk investor pemula karena lebih terdiversifikasi daripada saham. Meskipun demikian, investor tetap harus menyesuaikan profil risiko masing-masing karena semua produk investasi pasti memiliki risiko yang harus dihadapi. Oleh karena itu, sebelum membeli produk investasi perlu dilakukan analisis terlebih dahulu. Dalam penelitian ini dilakukan analisis menggunakan indikator teknikal untuk mengklasifikasi ETF menggunakan metode Support Vector Machines (SVM). Data ETF yang digunakan adalah data historis mingguan 25 ETF yang terdaftar di Bursa Efek Indonesia sejak 9 Maret 2020 hingga 6 Maret 2022. Indikator teknikal yang digunakan adalah moving average, support and resistance, Bollinger bands, dan directional indicator. Hasil dari perhitungan analisis indikator teknikal tersebut selanjutnya digunakan sebagai data input atau fitur dalam proses klasifikasi SVM. Proses klasifikasi bertujuan untuk mengklasifikasikan ETF yang berpotensi menghasilkan return ≥ 1 (return positif) atau < 1 (return negatif) di minggu selanjutnya dengan model SVM terbaik. Model SVM terbaik ditentukan berdasarkan nilai akurasi tertinggi. Pada penelitian ini, model SVM terbaik menghasilkan akurasi sebesar 77% dengan kernel polinomial dan proporsi data training sebanyak 80%. Terdapat 14 ETF yang diprediksi menghasilkan kelas positif oleh model SVM terbaik dan selanjutnya dilakukan pembentukan portofolio menggunakan metode Risk Parity (RP), Minimum Variance (MinV), dan Equal-Weight (EW). Ketiga metode pembentukan portofolio tersebut dibandingkan performanya untuk memilih portofolio terbaik berdasarkan nilai rasio Sharpe tertinggi. Hasil dari penelitian ini, metode MinV menghasilkan rasio Sharpe tertinggi dibandingkan dua metode lainnya.

Exchange-Traded Funds (ETF) is one of the Capital Market investment products in the form of mutual funds and being traded real-time like stocks. ETFs can be suitable for new investors because they are more diversified than stocks. Nonetheless, the risk profile of each investor must be suited since all investment products have risks that must be faced. Therefore, an analysis must be done before buying the investment products. In this study, an analysis was conducted using 4 technical indicators, such as, moving averages, support and resistance, Bollinger bands, and directional indicators. They were used to classify ETFs using the Support Vector Machines (SVM) method. The data used in this study consisted of weekly historical data of 25 ETFs listed on Indonesia Stock Exchange from March 9, 2020, to March 6, 2022. The result of the technical analysis calculation then be used as features in the SVM classification process. The classification process aims to classify ETFs that have the potential to generate returns of ≥ 1 (positive return) or < 1 (negative return) in the following week using the best SVM model. The best SVM model was determined based on the highest accuracy value. An accuracy of 77% with a polynomial kernel was achieved from a 80% proportion of training data. The 14 ETFs were predicted to gain a positive return using SVM for then a portfolio formed using the Risk Parity (RP), Minimum Variance (MinV) and Equal-Weight (EW) methods. The performances of those portfolio were being compared to choose the best portfolio based on the highest Sharpe Ratio value. The highest Sharpe Ratio portfolio were obtained by SVM-MinV method in this study."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Ni Putu Ayu Audia Ariantari

Perbandingan aplikasi dari klasifikasi support vector machines dan fuzzy support vector machines dalam memprediksi future claim pada asuransi kendaraan bermotor = Comparison between support vector machines and fuzzy support vector machines as classifiers for predicting future claim in automobile insurance

"Kestabilan perekonomian suatu negara ditentukan oleh sektor-sektor ekonomi di dalamnya. Salah satu sektor yang sedang berkontribusi secara signifikan di Indonesia adalah asuransi. Industri Asuransi sedang mengalami perluasan pada beberapa tahun terakhir. Seiring dengan perluasan tersebut, terdapat kompetisi antar perusahaan asuransi di Indonesia. Kompetisi ini menuntut perusahaan asuransi untuk lebih cerdik dalam mengungguli pasar. Tetapi, perlu diperhatikan bahwa perusahaan asuransi harus selalu sadar akan tingkat risiko yang harus ditanggungnya. Sehingga perlunya dilakukan penelitian tentang kemungkinan klaim di masa depan dari perusahaan asuransi.

Dalam penelitian ini, akan difokuskan pada sektor asuransi kendaraan bermotor di Indonesia. Model yang diajukan pada penelitian ini adalah suatu machine learning yang biasa digunakan untuk masalah klasifikasi dan prediksi. Metode klasifikasi yang digunakan adalah Support Vector Machines dan Fuzzy Support Vector Machines. Penelitian ini menggunakan data historis polis dari suatu perusahaan asuransi umum di Indonesia. Data historis polis ini terdiri dari 7.373 data dengan periode waktu berlaku polis adalah setahun terhitung dari Januari 2015 sampai dengan Desember 2016. Setelah itu, dibandingkan hasil dari kedua metode tersebut untuk mendapatkan hasil yang terbaik. Penggunaan data historis polis dari suatu asuransi umum di Indonesia ini menunjukkan bahwa Support Vector Machines menghasilkan tingkat akurasi rata rata 100 dalam klasifikasi dua kelas yaitu klaim dan tidak klaim. Memang waktu yang dibutuhkan relatif lama dalam mengklasifikasi data yaitu 4673,33 detik. Kemudian dibandingkan hasil olahan dengan klasifikasi Fuzzy Support Vector Machines dengan komposisi 80 training data dan akurasi yang dihasilkan adalah 99,23 .

Economics stability of a country is depending on each economics sector of the country. One of the most sector that give a significant contribution is Insurance. Insurance Industry is rapidly grow in recent years. As it grows bigger, there is exist one simple core that indeed affected Insurance Industry in Indonesia which is a competition. The competition is to force one Insurance company to be sharper to win the market. On the other hand, one should realize that Insurance company must be well aware of the immerging risk rate. Insurance company indeed should be prepared for the probability of high indemnities. It leads to the point that a study about future claim should be done for this matter.
In this study, one will focus on Automobile Insurance in Indonesia. The proposed model for this matter is using the mighty machine learning that is well known for classification and prediction problems. The classification methods that one will use are Support Vector Machines and Fuzzy Support Vector Machines. The aims of this study are to compare those two classification methods. This study also use a comprehensive historical policy data from a General Insurance company in Indonesia. This data consists of 7373 data with a one year policy starting from January 2015 until December 2016. One will has to compare those two methods to gain the best result. The used of this historical policy data will show that a classification using Support Vector Machines will result in 100 accuracy for binary classification, in this case will be yes or no claim within one year period. It is indeed takes longer to classify using this method. It takes about 4673,33 seconds. Then, one will compare the result with the other method which is Fuzzy Support Vector Machines with the used of 80 training data. It shows that the accuracy is 99,23 ."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2017

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian