Hasil Pencarian

Ditemukan 162779 dokumen yang sesuai dengan query

Lista Kurniawati

Analisis Kinerja Gabungan Metode Representasi Teks BERT dan Metode Clustering DEC untuk Pendeteksian Topik = Performance Analysis of BERT as Text Representation Method and DEC Clustering Method for Topic Detection

"Pendeteksian topik merupakan masalah komputasi yang menganalisis kata-kata dari suatu data teks untuk menemukan topik yang ada di dalam teks tersebut. Pada data yang besar, pendeteksian topik lebih efektif dan efisien dilakukan dengan metode machine learning. Data teks harus diubah ke dalam bentuk representasi vektor numeriknya sebelum dimasukkan ke model machine learning. Metode representasi teks yang umum digunakan adalah TF-IDF. Namun, metode ini menghasilkan representasi data teks yang tidak memperhatikan konteksnya. BERT (Bidirectional Encoder Representation from Transformer) merupakan metode representasi teks yang memperhatikan konteks dari suatu kata dalam dokumen. Penelitian ini membandingkan kinerja model BERT dengan model TF-IDF dalam melakukan pendeteksian topik. Representasi data teks yang diperoleh kemudian dimasukkan ke model machine learning. Salah satu metode machine learning yang dapat digunakan untuk menyelesaikan masalah pendeteksian topik adalah clustering. Metode clustering yang populer digunakan adalah Fuzzy C-Means. Namun, metode Fuzzy C-Means tidak efektif pada data berdimensi tinggi. Karena data teks berita biasanya memiliki ukuran dimensi yang cukup tinggi, maka perlu dilakukan proses reduksi dimensi. Saat ini, terdapat metode clustering yang melakukan reduksi dimensi berbasis deep learning, yaitu Deep Embedded Clustering (DEC). Pada penelitan ini digunakan model DEC untuk melakukan pendeteksian topik. Eksperimen pendeteksian topik menggunakan model DEC (member) dengan metode representasi teks BERT pada data teks berita menunjukkan nilai coherence yang sedikit lebih baik dibandingkan dengan menggunakan metode representasi teks TF-IDF.

Topic detection is a computational problem that analyzes words of a textual data to find the topics in it. In large data, topic detection is more effective and efficient using machine learning methods. Textual data must be converted into its numerical vector representation before being entered into a machine learning model. The commonly used text representation method is TF-IDF. However, this method produces a representation of text data that does not consider the context. BERT (Bidirectional Encoder Representation from Transformers) is a text representation method that pays attention to the context of a word in a document. This study compares the performance of the BERT model with the TF-IDF model in detecting topics. The representation of the text data obtained is then entered into the machine learning model. One of the machine learning methods that can be used to solve topic detection problems is clustering. The popular clustering method used is Fuzzy CMeans. However, the Fuzzy C-Means method is not effective on high-dimensional data. Because news text data usually has a high dimension, it is necessary to carry out a dimension reduction process. Currently, there is a clustering method that performs deep learning-based dimension reduction, namely Deep Embedded Clustering (DEC). In this research, the DEC model is used to detect topics. The topic detection experiment using the DEC (member) model with the BERT text representation method on news text data shows a slightly better coherence value than using the TF-IDF text representation method."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Nicholas Ramos Richardo

Analisis Performa EFCM dengan BERT sebagai Representasi Teks pada Pendeteksian Topik = The Performance of EFCM with BERT as Text Representation on Topic Detection

"Pendeteksian topik adalah suatu proses untuk menentukan suatu topik dalam teks dengan menganalisis kata di dalam teks tersebut. Pendeteksian topik dapat dilakukan dengan membaca isi dari teks tersebut. Namun, cara ini semakin sulit apabila data yang dimiliki semakin besar. Memanfaatkan metode machine learning dapat menjadi alternatif dalam menangani data yang berjumlah besar. Metode clustering adalah metode pengelompokkan data yang mirip dari suatu kumpulan data. Beberapa contoh metode clustering adalah K-Means, Fuzzy C-Means (FCM), dan Eigenspaced-Based Fuzzy C-Means (EFCM). EFCM adalah metode clustering yang memanfaatkan metode reduksi dimensi Truncated Singular Value Decomposition (TSVD) dengan metode FCM (Murfi, 2018). Dalam pendeteksian topik, teks harus direpresentasikan kedalam bentuk vektor numerik karena model clustering tidak dapat memproses data yang berbetuk teks. Metode yang sebelumnya umum digunakan adalah Term-Frequency Inversed Document Frequency (TFIDF). Pada tahun 2018 diperkenalkan suatu metode baru yaitu metode Bidirectional Encoder Representations from Transformers (BERT). BERT merupakan pretrained language model yang dikembangkan oleh Google. Penelitian ini akan menggunakan model BERT dan metode clutering EFCM untuk masalah pendeteksian topik. Kinerja performa model dievaluasi dengan menggunakan metrik evaluasi coherence. Hasil simulasi menunjukkan penentuan topik dengan metode modifikasi TFIDF lebih unggul dibandingkan dengan metode centroid-based dengan dua dari tiga dataset yang digunakan metode modifikasi TFIDF memiliki nilai coherence yang lebih besar. Selain itu, BERT lebih unggul dibandingkan dengan metode TFIDF dengan nilai coherence BERT pada ketiga dataset lebih besar dibandingkan dengan nilai coherence TFIDF.

Topic detection is a process to determine a topic in the text by analyzing the words in the text. Topic detection can be done with reading the contents of the text.However, this method is more difficult when bigger data is implemented. Utilizing machine learning methods can be an alternative approach for handling a large amount of data. The clustering method is a method for grouping similar data from a data set. Some examples of clustering methods are K-Means, Fuzzy C-Means (FCM), and Eigenspaced-Based Fuzzy C-Means (EFCM). EFCM is a clustering method that utilizes the truncated dimension reduction method Singular Value Decomposition (TSVD) with the FCM method (Murfi, 2018). In topic detection, the text must be represented in numerical vector form because the clustering model cannot process data in the form of text. The previous method that was most commonly used is the Term-Frequency Inverse Document Frequency (TFIDF). In 2018 a new method was introduced, namely the Bidirectional Encoder method Representations from Transformers (BERT). BERT is a pretrained language model developed by Google. This study will use the BERT model and the EFCM clustering method for topic detection problems. The performance of the model is evaluated using the coherence evaluation metric. The simulation results show that modified TFIDF method for topic determination is superior to the centroid-based method with two of the three datasets used by modified TFIDF method having a greater coherence value. In addition, BERT is superior to the TFIDF method with the BERT coherence value in the three datasets greater than the TFIDF coherence value."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Alvin Subakti

Analisis kinerja BERT sebagai metode representasi teks untuk Text Clustering = Performance analysis of BERT as a text representation method for Text Clustering

"Text clustering adalah teknik pengelompokan teks sehingga teks di dalam kelompok yang sama memiliki tingkat similaritas yang lebih tinggi satu sama lain dibandingkan dengan teks pada kelompok yang berbeda. Proses pengelompokkan teks secara manual membutuhkan waktu dan sumber daya yang banyak sehingga digunakan machine learning untuk melakukan pengelompokan secara otomatis. Representasi dari teks perlu diekstraksi sebelum dimasukkan ke dalam model machine learning. Metode yang umumnya digunakan untuk mengekstraksi representasi data teks adalah TFIDF. Namun, metode TFIDF memiliki kekurangan yaitu tidak memperhatikan posisi dan konteks penggunaan kata. Model BERT adalah model yang dapat menghasilkan representasi kata yang bergantung pada posisi dan konteks penggunaan suatu kata dalam kalimat. Penelitian ini menganalisis kinerja model BERT sebagai metode representasi data teks dengan membandingkan model BERT dengan TFIDF. Selain itu, penelitian ini juga mengimplementasikan dan membandingkan kinerja metode ekstraksi dan normalisasi fitur yang berbeda pada representasi teks yang dihasilkan model BERT. Metode ekstraksi fitur yang digunakan adalah max dan mean pooling. Sementara itu, metode normalisasi fitur yang digunakan adalah identity, layer, standard, dan min-max normalization. Representasi teks yang diperoleh dimasukkan ke dalam 4 algoritma clustering berbeda, yaitu k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, dan improved deep embedded clustering. Kinerja representasi teks dievaluasi dengan menggunakan metrik clustering accuracy, normalized mutual information, dan adjusted rand index. Hasil simulasi menunjukkan representasi data teks yang dihasilkan model BERT mampu mengungguli representasi yang dihasilkan TFIDF pada 28 dari 36 metrik. Selain itu, implementasi ekstraksi dan normalisasi fitur yang berbeda pada model BERT memberikan kinerja yang berbeda-beda dan perlu disesuaikan dengan algoritma yang digunakan.

Text clustering is a task of grouping a set of texts in a way such that text in the same group will be more similar toward each other than to those from different group. The process of grouping text manually requires significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. Text representation needs to be extracted to become the input for machine learning models. The common method used to represent textual data is TFIDF. However, TFIDF cannot consider the position and context of a word in a sentence. BERT model has the capability to produce text representation that incorporate position and context of a word in a sentence. This research analyzed the performance of BERT model as a text representation method by comparing it with TFIDF. Moreover, various feature extraction and normalization methods are also applied in text representation from BERT model. Feature extraction methods used are max and mean pooling. On the other hand, feature normalization methods used are identity, layer, standard, and min-max normalization. Text representation obtained become an input for 4 clustering algorithms, k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Performance of text representations in text clustering are evaluated utilizing clustering accuracy, normalized mutual information, and adjusted rand index. Simulation results showed that text representation obtained from BERT model outperforms representation from TFIDF in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Alvin Subakti

Analisis kinerja BERT sebagai metode representasi teks untuk text clustering = Performance analysis of BERT as a text representation method for text clustering.

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Anne Parlina

Analisis tren penelitian dari koleksi publikasi ilmiah dengan metode deteksi topik berbasis clustering = Cluster-based topic detection method for research and publication trend analytics

"Tren adalah suatu pola yang berulang, sementara analisis tren merupakan praktik pengumpulan dan analisis data dalam upaya untuk menemukan pola tersebut. Analisis tren adalah suatu metode untuk memproyeksikan kondisi masa depan berdasarkan data masa lalu hingga saat ini. Tinjauan literatur sistematis, bibliometrik, dan topic modeling adalah beberapa contoh pendekatan yang sering dipakai untuk menangkap fenomena perkembangan tren sains dan teknologi. Penelitian ini bertujuan untuk melakukan pengujian dan implementasi algoritma deteksi topik berbasis clustering yang dikombinasikan dengan analisis kualitatif dalam pendeteksian tren topik untuk mendapatkan gambaran yang menyeluruh mengenai konsep, struktur ilmiah, topik utama, dan perkembangan bidang teknologi big data dan smart sustainable city. Analisis topik dilakukan terhadap kumpulan data bibliografi publikasi ilmiah terkait kedua bidang tersebut yang didapat dari basis data Scopus dan CORE. Pengujian terhadap kinerja algoritma Deep-autoencoder based Fuzzy C-Means (DFCM) untuk deteksi topik dari corpus dokumen publikasi ilmiah menunjukkan bahwa algoritma DFCM menunjukkan kinerja yang baik serta dapat mengungguli kinerja algoritma-algoritma standar yang banyak dipakai untuk pendeteksian topik seperti Non-negatif Matrix Factorization (NMF) dan Latent Dirichlet Allocation (LDA) pada corpus dengan ukuran besar. Analisis hasil clustering terhadap data publikasi ilmiah memberikan gambaran perkembangan dan topik-topik yang menjadi “highlight” dalam periode tertentu, mencari research gap dan mengetahui karakteristik penelitian, serta memprediksi topik penelitian apa saja yang menjanjikan di masa depan.

A trend is a recurring pattern, while trend analysis is the practice of collecting and analyzing data to find that pattern. Trend analysis is a method for projecting future conditions based on past to present data. Systematic literature review, bibliometrics, and topic modeling are examples of approaches that are often used to capture the phenomenon of the development of science and technology trends. This study examined and implemented clustering-based topic detection algorithms, combined with qualitative analysis, to comprehensively picture the concept, scientific structure, main topics, and developments in big data technology and smart and sustainable city. The topic analysis is performed on collecting bibliographic data from scientific publications related to these two fields obtained from the Scopus and CORE database. In this research, the deep-autoencoder based on the Fuzzy C-Means (DFCM) algorithm's performance for topic detection from the corpus of scientific publication documents was examined. Based on the experiment's results, it can be concluded that the DFCM algorithm shows good performance and can outperform standard algorithms that are widely used for topic detection, such as Non-negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA) on topic detection tasks in huge corpus text. The clustering results analysis on scientific publication data provides an overview of research topics and developments that become "highlights" in a certain period, discover research gaps and characteristics, and predict what research topics are promising in the future."

Depok: Fakultas Teknik Universitas Indonesia, 2021

D-pdf

UI - Disertasi Membership Universitas Indonesia Library

Darin Ramadhiani Gita Wijaya

Penerapan Analisis Sentimen dan Pendeteksian Topik pada Ulasan Pengguna Aplikasi Mypertamina di Play Store = Implementation of Sentiment Analysis and Topic Detection on MyPertamina App User Reviews in Play Store

"Sebagai BUMN yang bergerak di bidang energi, PT Pertamina (Persero) harus memastikan distribusi BBM Subsidi tepat sasaran dan tidak terjadi penyalahgunaan. Dalam upaya tersebut, mulai 1 Juli 2022 Pertamina melakukan uji coba program Subsidi Tepat, di mana konsumen BBM Subsidi yang memiliki kendaraan roda empat harus mendaftarkan kendaraannya untuk dapat membeli Pertalite atau Biosolar. Salah satu cara pendaftaran program Subsidi Tepat dapat dilakukan di aplikasi digital MyPertamina, suatu aplikasi loyalitas untuk seluruh pelanggan produk Pertamina yang dapat diunduh di toko aplikasi digital Play Store. Hingga awal Maret 2023, aplikasi MyPertamina telah diunduh sebanyak lebih dari 10 juta kali di Play Store. Namun, penilaian (rating) yang diberikan pengguna di Play Store hanya mencapai 2,9/5. Angka tersebut cukup kecil jika dibandingkan dengan aplikasi layanan pemerintah lainnya yang memiliki jumlah unduhan serupa. Dengan banyaknya jumlah pengunduh dan rendahnya rating dari pengguna, ulasan pengguna perlu dianalisis untuk memastikan kinerja aplikasi MyPertamina. Berdasarkan hal tersebut, penelitian ini akan menerapkan pendeteksian topik menggunakan model BERT-EFCM untuk menganalisis topik-topik mengenai aplikasi MyPertamina pada ulasan pengguna di Play Store dan akan menerapkan analisis sentimen menggunakan model BERT-NN untuk menganalisis sentimen yang diekspresikan pada setiap topik yang dibahas mengenai aplikasi MyPertamina pada ulasan pengguna di Play Store. Hasil penelitian menunjukkan terdapat tiga topik yang dibahas mengenai aplikasi MyPertamina yaitu, penggunaan aplikasi untuk pembelian BBM di SPBU, pendaftaran dan layanan yang terkait dengan aplikasi, dan evaluasi pengguna terhadap aplikasi. Pada keseluruhan topik, mayoritas pengguna memberikan sentimen negatif dengan perbandingan sentimen sebagai berikut: 84% negatif dan 16% positif untuk topik pertama, 85% negatif dan 15% positif untuk topik kedua, serta 80% negatif dan 20% positif untuk topik ketiga.

As a state-owned enterprise in the energy sector, PT Pertamina (Persero) must ensure the targeted distribution of subsidized fuel (BBM) and prevent misuse. In this effort, starting from July 1, 2022, Pertamina initiated a pilot program called "Subsidi Tepat" (Precise Subsidy), where BBM Subsidi consumers with four-wheeled vehicles are required to register their vehicles in order to purchase Pertalite or Biosolar. One of the registration methods for the Subsidi Tepat program is through the MyPertamina digital application, a loyalty application for all Pertamina product customers that can be downloaded from the Play Store digital application store. Until early March 2023, the MyPertamina application has been downloaded more than 10 million times from the Play Store. However, the user ratings given in the Play Store only reach 2,9/5. This rating is relatively low compared to other government service applications with a similar number of downloads. With a large number of downloads and low user ratings, it is necessary to analyze user reviews to ensure the performance of the MyPertamina application. Based on this, this research will apply topic detection using the BERT-EFCM model to analyze the topics discussed in user reviews of the MyPertamina application in the Play Store. It will also apply sentiment analysis using the BERT-NN model to analyze the sentiments expressed for each topic related to the MyPertamina application in user reviews on the Play Store. The research results show three topics discussed regarding the MyPertamina application: the use of the application for purchasing BBM at gas stations, registration and related services, and user evaluations of the application. Overall, the majority of users express negative sentiments with the following sentiment ratios: 84% negative and 16% positive for the first topic, 85% negative and 15% positive for the second topic, and 80% negative and 20% positive for the third topic."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Yudhistira Jinawi Agung

Analisis Sensitivitas Parameter Model EFCM Berbasis BERT untuk Pendeteksian Topik = Parameter Sensitivity Analysis of BERT-based EFCM Model for Topic Detection

"Pendeteksian topik adalah suatu proses untuk mendapatkan pokok bahasan atau topik pada suatu dokumen teks. Pada data yang besar, pendeteksian topik dapat dilakukan dengan lebih efisien menggunakan metode machine learning. Clustering merupakan salah satu metode machine learning yang bertujuan untuk mengelompokkan data yang memiliki karakteristik serupa ke dalam suatu kelompok/cluster. Beberapa contoh metode clustering adalah K-Means, Fuzzy C-Means (FCM), dan Eigenspace-Based Fuzzy C-Means (EFCM). Metode clustering hanya memproses data numerik, oleh sebab itu diperlukan metode representasi teks. Metode representasi teks yang umum digunakan sebelumnya adalah Bag of Words (BoW) dan Term-Frequency Inversed Document Frequency (TFIDF). Namun, metode BoW dan TFIDF kurang baik dalam merepresentasikan teks secara kontekstual. Pada tahun 2018 metode representasi teks yang baru ditemukan yaitu metode Bidirectional Encoder Representation from Transformers (BERT). Model BERT dapat merepresentasikan teks secara kontekstual dan menghasilkan representasi teks berdimensi tinggi. EFCM merupakan teknik clustering yang menggunakan kombinasi teknik reduksi dimensi Truncated Singular Value Decomposition (TSVD) dengan teknik clustering FCM. Pada tahun 2022 terdapat penelitian yang mengombinasikan BERT dan EFCM untuk pendeteksian topik. Pada model kombinasi BERT dan EFCM terdapat beberapa nilai parameter yang dapat diatur, antara lain adalah pemilihan lapisan encoder BERT, dimensi EFCM, dan derajat fuzziness. Penelitian ini berfokus pada analisis sensitivitas parameter untuk melihat pengaruh dari nilai parameter terhadap kinerja model EFCM berbasis BERT untuk pendeteksian topik. Analisis sensitivitas parameter menggunakan metode Sobol untuk menentukan parameter yang tidak sensitif dan yang paling sensitif. Kinerja model dievaluasi menggunakan metrik evaluasi topic coherence, topic diversity, dan topic quality. Hasil penelitian menunjukkan bahwa parameter lapisan encoder, dimensi EFCM, dan derajat fuzziness sensitif terhadap kinerja model. Selain itu, diperoleh model optimal pada tiga dataset menggunakan parameter tuning metode grid search. Penerapan parameter tuning dapat meningkatkan performa model pada ketiga dataset berdasarkan nilai topic quality.

Topic detection is a process to get the subject matter or topic in a text document. In large data, topic detection can be done more efficiently using machine learning methods. Clustering is a machine learning method aiming to group data with similar characteristics into a group/cluster. Some examples of clustering methods are K-Means, Fuzzy C-Means (FCM), and Eigenspace-Based Fuzzy C-Means (EFCM). The clustering method only processes numeric data; therefore, a text representation method is needed. Previously used text representation methods were Bag of Words (BoW) and Term-Frequency Inverse Document Frequency (TFIDF). However, the BoW and TFIDF methods are not good at representing text contextually. In 2018 a new text representation method was discovered, namely the Bidirectional Encoder Representation from Transformers (BERT) method. The BERT model can contextually represent text and produce high-dimensional text representations. EFCM is a clustering technique that combines the Truncated Singular Value Decomposition (TSVD) dimension reduction technique with the FCM clustering technique. In 2022 there will be research that combines BERT and EFCM for topic detection. In the BERT and EFCM combination model, there are several parameter values that can be set, including the selection of the BERT encoder layer, EFCM dimensions, and the degree of fuzziness. This study focuses on parameter sensitivity analysis to see the effect of parameter values on the performance of the BERT-based EFCM model for topic detection. Parameter sensitivity analysis uses the Sobol method to determine which parameters are insensitive and the most sensitive. Model performance was evaluated using evaluation metrics of topic coherence, topic diversity, and topic quality. The results showed that the parameters of the encoder layer, EFCM dimensions, and degree of fuzziness were sensitive to model performance. In addition, the optimal model was obtained for three datasets using the grid search method parameter tuning. Parameter tuning can improve the model performance on the three datasets based on topic quality values."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Evan Haryowidyatna

Analisis Pengelompokan Kabupaten dan Kota di Pulau Jawa Sebagai Sasaran Industri Sepeda Motor dengan Metode Partitional Hard Clustering = Clustering Analysis of Districts and Cities in The Island of Java as Targets of Motorcycle Industry Using Partitional Hard Clustering Method

"Per 9 Februari 2023, 87% dari total populasi kendaraan pribadi di Indonesia merupakan sepeda motor. Persebaran sepeda motor terpadat di Indonesia berada di Pulau Jawa dengan persentase sebesar 60%. Tingginya populasi sepeda motor dan fakta bahwa 80% rumah tangga di Pulau Jawa sudah memiliki sepeda motor membuat pasar sepeda motor semakin mengecil. Dalam jangka panjang, kondisi ini dapat berdampak buruk bagi industri sepeda motor yang terus ingin berkembang. Penelitian ini membahas tentang pengelompokan kabupaten dan kota di Pulau Jawa berdasarkan karakteristik demografinya. Kemudian, diberikan saran keputusan yang dapat dilakukan oleh industri sepeda motor berdasarkan kelompok kabupaten dan kota yang terbentuk menggunakan teknik clustering. Hal ini bertujuan agar produsen yang bergerak di industri sepeda motor dapat memfokuskan produknya pada kelompok kabupaten dan kota yang memiliki potensi terbaik. Terdapat 12 variabel demografi yang digunakan dalam penelitian ini, dan variabel tersebut terbagi menjadi tiga kategori: kondisi ekonomi masyarakat, kondisi kehidupan masyarakat, dan kondisi demografis daerah. Metode yang digunakan dalam penelitian ini adalah metode partitional hard clustering. Sebelumnya, dilakukan pembuatan dataset melalui proses data scrapping pada situs terpercaya, dan dilanjutkan dengan proses Exploratory Data Analysis (EDA) pada dataset. Setelah dataset terbentuk, dilakukan pengelompokan dengan metode partitional hard clustering yang terdiri dari metode K-Means Clustering dan metode K-Medoids Clustering. Kemudian, dilakukan evaluasi cluster untuk menentukan metode clustering yang paling sesuai dengan menggunakan empat metrik evaluasi yaitu Indeks Silhouette, Indeks Dunn, Indeks Davies Bouldin, dan Indeks Calinski Harabasz. Didapatkan hasil bahwa metode K-Medoids Clustering dengan 5 kelompok merupakan yang terbaik untuk mengelompokkan kabupaten dan kota di Pulau Jawa. Setelah kelompok terbentuk, setiap kelompok diberikan rekomendasi keputusan yang sebaiknya diambil oleh industri sepeda motor. Terdapat 4 rekomendasi yang dapat diberikan, yaitu distribusi suku cadang, pembuatan bengkel, penjualan sepeda motor kelas menengah ke atas, dan penjualan sepeda motor kelas menengah ke bawah.

As of February 9, 2023, 87% of the total population of private vehicles in Indonesia consists of motorcycles. The densest distribution of motorcycles in Indonesia is found on the Island of Java, with a percentage of 60%. The high population of motorcycles and the fact that 80% of households in Java already have motorcycles are causing the motorcycle market to shrink. In the long run, this condition can have negative impacts on the motorcycle industry that continues to seek growth. This research focuses on the clustering of regencies and cities in Java based on their demographic characteristics. Subsequently, decision recommendations will be provided for the motorcycle industry based on the formed groups using clustering techniques. The aim is to enable manufacturers in the motorcycle industry to focus their products on regencies and cities with the best potential. There are 12 demographic variables used in this research, divided into three categories: the economic conditions of society, the living conditions of society, and the demographic conditions of the region. The method used in this research is the partitional hard clustering method. Firstly, a dataset is created through the data scraping process on trusted sites, followed by the Exploratory Data Analysis (EDA) process on the dataset. Once the dataset is formed, clustering is performed using the partitional hard clustering method, consisting of the K-Means Clustering and K-Medoids Clustering methods. Subsequently, cluster evaluation is carried out to determine the most suitable clustering method using four evaluation metrics: Silhouette Index, Dunn Index, Davies Bouldin Index, and Calinski Harabasz Index. The results show that the K-Medoids Clustering method with 5 clusters is the best for grouping regencies and cities in Java. After the groups are formed, each group is given decision recommendations that the motorcycle industry should consider. There are four recommendations: spare parts distribution, workshop establishment, sales of mid- to high-end motorcycles, and sales of mid-range motorcycles and below."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Widi Nugroho

Analisis Kinerja Metode Convolutional Neural Network (CNN) Arsitektur ResNet50 dalam Mengklasifikasi Penyakit Retinopathy of Prematurity Pada Citra Fundus Retina = Performance Analysis of the ResNet50 Architecture Convolutional Neural Network (CNN) Method in Classifying Retinopathy of Prematurity Diseases on Retinal Fundus Images

"Bayi prematur adalah bayi yang lahir dengan usia kehamilan kurang dari 37 minggu yang memiliki sistem saraf dan organ-organ yang belum sempurna sehingga lebih beresiko mengalami berbagai masalah kesehatan. Salah satu masalah kesehatan yang dapat terjadi adalah pada organ mata yang merupakan organ penting dalam perkembangan bayi. Retinopathy of Prematurity (ROP) merupakan salah satu penyakit mata yang terjadi pada bayi prematur yang disebabkan oleh pembentukan pembuluh darah retina yang tidak normal. Proses diagnosis yang dilakukan oleh dokter mata belum bisa mengatasi kenaikan jumlah kasus ROP, sehingga disini penulis menggunakan pendekatan deep learning untuk melakukan klasifikasi tingkat keparahan ROP pada citra fundus retina. Metode deep learning yang digunakan adalah Convolutional Neural Network (CNN) dengan arsitektur ResNet50. Data yang digunakan pada penelitian ini merupakan data sekunder yang diperoleh dari online database Kaggle berupa 90 data citra fundus retina yang terbagi atas 38 citra bukan penderita ROP, 19 citra penderita ROP Stage 1, 22 citra penderita ROP Stage 2, dan 11 citra penderita ROP Stage 3. Pada tahap persiapan data, dilakukan perbaikan kontras citra menggunakan Contrast Limited Adaptive Histogram (CLAHE) dan image masking. Kemudian dilakukan resize citra menjadi ukuran 224×224. Data kemudian diaugmentasi menggunakan teknik flip horizontal dan rotation agar data menjadi lebih banyak yang kemudian dibagi menjadi 80% data training dan 20% data testing. Dari 80% data training, diambil 20% untuk data validation. Training model dilakukan menggunakan model dengan arsitektur ResNet50 dengan hyerparameter model yaitu batch size 64, learning rate 0.001, dan epoch sebanyak 30, fungsi optimasi Adam (Adaptive moment estimation), dan fungsi loss categorical cross entropy. Proses modelling dilakukan sebanyak 5 kali percobaan dan berhasil memperoleh nilai rata-rata kinerja training model sebesar 99.714% dan 92.85% pada akurasi training dan akurasi validation-nya, selain itu diperoleh nilai 0.01864 dan 0.18434 pada loss training dan loss validation. Sedangkan rata-rata kinerja testing model berhasil memperoleh akurasi testing sebesar 97.352%, testing loss sebesar 0.0986374, dan AUROC sebesar 0.0955. Selain melakukan evaluasi kinerja, peneliti juga akan menggunakan GradCAM untuk menampilkan visualisasi ciri-ciri yang dianggap penting untuk nantinya membantu dokter dalam mengevaluasi ROP.

Premature infants are babies born with a gestational age of less than 37 weeks, and they have underdeveloped nervous systems and organs, making them more susceptible to various health issues. One of the health problems that can occur involves the eye, which plays a crucial role in the baby's development. Retinopathy of Prematurity (ROP) is one of the eye diseases that affects premature infants and is caused by abnormal blood vessel formation in the retina. The current diagnostic processes performed by ophthalmologists have not been effective in addressing the increase in ROP cases. Therefore, in this study, the author employs a deep learning approach to classify the severity of ROP in retinal fundus images. The deep learning method utilized is the Convolutional Neural Network (CNN) with the ResNet50 architecture. The research data consists of 90 retinal fundus images obtained from the online database Kaggle, comprising 38 images of non-ROP cases, 19 images of ROP Stage 1, 22 images of ROP Stage 2, and 11 images of ROP Stage 3. In the data preparation phase, the image contrast is enhanced using Contrast Limited Adaptive Histogram (CLAHE) and image masking techniques. Subsequently, the images are resized to 224×224 dimensions. Data augmentation is performed using horizontal flip and rotation techniques to increase the dataset, which is then split into 80% training data and 20% testing data. From the 80% training data, 20% is further allocated for validation data. The model is trained using the ResNet50 architecture with hyperparameters set to batch size 64, learning rate 0.001, and 30 epochs. The optimization function used is Adam (Adaptive Moment Estimation), and the loss function is categorical cross-entropy. The modeling process is repeated five times, and the average performance of the training model is achieved at 99.714% for training accuracy and 92.85% for validation accuracy, with training and validation losses of 0.01864 and 0.18434, respectively. As for the average performance of the testing model, the testing accuracy is 97.352%, the testing loss is 0.0986374, and the AUROC (Area Under the Receiver Operating Characteristic) is 0.0955. In addition to evaluating the model's performance, the researcher also employs GradCAM to visualize important features, which can assist doctors in evaluating ROP cases."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Syach Riyan Muhammad Ardiyansyah

Analisis Kinerja Metode BERT-IDEC untuk Deteksi Topik = BERT-IDEC Method Performance Analysis for Topic Detection

"Pendeteksian topik merupakan sebuah proses dalam menganalisis data teks untuk menemukan sebuah topik-topik yang ada pada data teks. Pada era digital saat ini, pendeteksian topik sering digunakan untuk menganalisis topik dan mengelompokkan informasi berdasarkan topiknya. Machine learning membantu proses pendeteksian topik menjadi lebih cepat dan efisien, terutama pada data teks dengan ukuran data yang besar. Salah satu metode machine learning yang dapat digunakan untuk pendeteksian topik adalah metode clustering. Namun karena dimensi data yang tinggi membuat beberapa metode clustering kurang efektif menyelesaikan pendeteksian topik. Untuk mengatasi hal tersebut data yang memiliki ukuran dimensi yang cukup tinggi perlu dilakukan proses reduksi dimensi terlebih dahulu. Improved Deep Embedded Clustering (IDEC) merupakan sebuah metode clustering yang secara bersamaan melakukan reduksi dimensi data dan clustering. Oleh karena itu, pada penelitian ini dilakukan pendeteksian topik dengan metode clustering IDEC. Data yang digunakan pada penelitian ini merupakan data berita online AG News, Yahoo! Answer, dan R2. Namun pada metode IDEC, data teks tidak bisa langsung menerima input berupa data teks. Data teks perlu diubah menjadi vektor representasi yang dapat diterima input. Pada penelitian ini digunakan metode representasi teks Bidirectional Encoder Representation from Transformers (BERT). Data teks mula-mula akan diubah oleh BERT menjadi vektor representasi, setelah itu vektor representasi akan diterima dan dilakukan pendeteksian topik oleh metode IDEC. Kemudian pada proses simulasi dilakukan perbandingan kinerja model IDEC dengan representasi teks BERT dan model IDEC dengan representasi teks TF-IDF. Didapatkan hasil simulasi dari kinerja model IDEC dengan representasi teks BERT memiliki kinerja yang lebih unggul dibandingkan dengan model IDEC dengan representasi teks TF-IDF

Topic detection is a process in analyzing text data to find topics that exist in text data. In today's digital era, topic detection is often used to analyze topics and grouping the information by topic. Machine learning helps the topic detection process to be faster and more efficient, especially in text data with large data sizes. One of the machine learning methods that can be used for topic detection is the clustering method. However, because the high data dimensions make some clustering methods less effective in completing topic detection. To overcome this, data that has a sufficiently high dimension size needs to be carried out in a dimension reduction process first. Improved Deep Embedded Clustering (IDEC) is a clustering method that simultaneously performs data dimension reduction and clustering. Therefore, in this study, topic detection was carried out using the IDEC clustering method. The data used in this study is the online news data of AG News, Yahoo! Answer, and R2. However, in the IDEC method, text data cannot directly receive input in the form of text data. Text data needs to be converted into a vector representation that can accept input. In this study, the Bidirectional Encoder Representation from Transformers (BERT) text representation method was used. The text data will first be converted by BERT into a vector representation, after that the vector representation will be accepted and topic detection will be carried out by the IDEC method. Then the simulation process compares the performance of the IDEC model with the BERT text representation and the IDEC model with the TF-IDF text representation. The simulation results obtained from the performance of the IDEC model with the text representation of BERT which has superior performance compared to the IDEC model with the text representation of TF-IDF."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian