Hasil Pencarian

Ditemukan 129693 dokumen yang sesuai dengan query

Gabriella Kurniawan

Perbandingan K-means dan spherical K-means untuk klasifikasi data hepatitis = Comparison of K-means and spherical K-means for classification of hepatitis data

"ABSTRACT

Hepatitis merupakan penyakit peradangan pada hati yang dapat disebabkan oleh virus hepatitis. Di antara lima jenis hepatitis, hepatitis B dan hepatitis C merupakan jenis hepatitis yang dapat berkembang menjadi kanker hati. Kanker hati merupakan jenis kanker nomor tujuh tertinggi di dunia dan nomor tiga yang menyebabkan kematian karena kanker. Seseorang yang memiliki gejala penyakit hepatitis dapat melakukan serangkaian uji laboratorium untuk melihat kondisi kesehatannya. Hasil laboratorium hepatitis dapat kita manfaatkan untuk membentuk suatu program yang dapat mengklasifikasi hepatitis B dan hepatitis C. K-Means Clustering merupakan salah satu metode clustering yang dapat dimanfaatkan untuk mengklasifikasi hepatitis B dan hepatitis C. K-Means Clustering cukup mudah untuk diimplementasikan dan waktu yang digunakan untuk mengolah data juga cukup sedikit sehingga, metode ini cukup baik untuk mengklasifikasi data hepatitis B dan hepatitis C. Sementara, Spherical K-Means merupakan metode lanjutan dari K-Means Clustering. Hasil klasifikasi dari dua buah metode akan digunakan untuk melihat akurasi dari kedua buah metode dan membandingkan kedua metode tersebut.

ABSTRACT

Hepatitis is an inflammatory disease of the liver caused by hepatitis virus. Among the five types of hepatitis virus, hepatitis B and hepatitis C is the types of hepatitis that can develop into liver cancer. Liver cancer is number seventh in the world for the highest cancer case and number third of the highest death because of cancer. Someone who has symptoms of hepatitis can carry out a series of laboratory tests to see his health condition. This laboratory results can be used to form a program to classify hepatitis B and hepatitis C data. K-Means Clustering is a clustering method which can be used to classify hepatitis B and hepatitis C data. K-Means Clustering was rather easy to use and less time was needed to running the program of K-Means Clustering, with the result that, K-Means Clustering method was good enough to classify hepatitis B and hepatitis C data. While, Spherical K-Means is an advanced method of K-Means Clustering. Classification results from this two methods will be used to see the accuracy of the data and compare the two methods."

2018

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Ajeng Leudityara Fijri

Klasifikasi kanker payudara menggunakan kernel spherical K-means = Breast cancer clustering using kernel spherical K-means

"ABSTRACT

Kanker payudara adalah pertumbuhan sel-sel abnormal di jaringan pada payudara yang berkembang secara tidak terkendali. Perkembangan sel-sel abnormal secara tidak terkendali ini menyebabkan kanker menjadi salah satu penyakit paling mamatikan yang umumnya dialami oleh wanita di seluruh dunia. Salah satu cara untuk mengurangi berkembangnya sel kanker ini adalah dengan melakukan pendeteksian dini menggunakan machine learning. Beberapa metode machine learning berhasil melakukan klasifikasi kanker. Clustering merupakan salah satu metode dari machine learning yang bertujuan untuk mengelompokkan suatu dataset ke dalam subset berdasarkan ukuran jarak. Kernel Spherical K-Means (KSPKM) adalah salah satu metode clustering dengan mengganti hasil kali dalam yang ada pada Spherical K-Means (SPKM) dengan fungsi Kernel. Data kanker payudara yang digunakan pada penelitian ini adalah data kanker payudara Coimbra. Data kanker payudara Coimbra ini merupakan hasil dari pengambilan tes laboratorium yang dapat mendeteksi kanker payudara pada tubuh. Hasil klasifikasi data kanker payudara Coimbra dengan menggunakan metode SPKM memiliki hasil akurasi sebesar 81,82% dengan running time selama 0,16 detik, sensivicity sebesar 100%, dan specificity sebesar 65,62% sedangkan hasil akurasi dengan menggunakan KSPKM dengan Radial Basis Function (RBF) adalah 72,41% dengan running time 0,98 detik, sensivicity sebesar 61,54%, dan specificity sebesar 81,25% . Berdasarkan hasil akurasi pada 10% sampai 90% data yang digunakan, metode KSPKM menghasilkan akurasi yang lebih stabil dibandingkan hasil akurasi pada metode SPKM.

ABSTRACT

Breast cancer is the growth uncontrollably of abnormal cells in the tissue in the breast. The development of abnormal cells uncontrollably causes cancer to become one of the most deadly diseases commonly among women the worldwide. One way to reduce the development of cancer cells is by early detection using machine learning. Some machine learning methods successfully classify cancer. Clustering is one of the methods of machine learning that aims to grouping of a dataset into subsets based on distance measurement.. Kernel Spherical K-Means (KSPKM) is one of the clustering methods by replacing the inner products in the Spherical K-Means (SPKM) by Kernel functions. The breast cancer data used in this study were Coimbra breast cancer data. The Coimbra breast cancer data is the result of taking laboratory tests that can detect breast cancer in the body. The classification results for Coimbra breast cancer data using the SPKM method has highest accuracy 81,82% with running time for 0,16 seconds, sensivicity 100%, and specificity 65,62% while the highest accuracy results using KSPKM with Kernel radial basis function (RBF) are 72,41% with running time 0,98 seconds, sensivicity 61,54%, and specificity 81,25%. Based on the results of the accuracy of 10% to 90% of the training data used, the KSPKM method produces more stable accuracy than the accuracy results of SPKM method."

2018

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Ardibian Krismanti

Klasifikasi kanker otak (astrocytoma) menggunakan principal component analysis dan spherical k-means

Depok: Universitas Indonesia, 2010

S27787

UI - Skripsi Open Universitas Indonesia Library

Ardibian Krismanti

Klasifikasi kanker otak (astrocytoma) menggunakan principal component analysis dan spherical k-means

"Dari pemeriksaan MRI, diperoleh gambar jaringan otak, yang akan digunakan oleh proton MRS untuk menentukan konsentrasi metabolit otak pada jaringan yang didiagnosa astrocytoma, seperti metabolit NAA, choline, creatine, Lipid, Lactate, Myoinositol, dan Glutamine-glutamate. Dari hasil MRS ini, astrocytoma dapat diklasifikasi berdasarkan derajat keganasannya (grade), yaitu high grade dan low grade. Proses klasifikasi astrocytoma, biasa dilakukan secara manual oleh ahli patologi atau secara statistik. Dalam skripsi ini, akan dibahas proses klasifikasi astrocytoma menjadi tiga kelas derajat keganasan dengan menggunakan metode Principal Component Analysis (PCA) dan Spherical K-Means terhadap data MRS. Algoritma Spherical K-Means merupakan algoritma K- Means dengan cosine similarity. Sedangkan PCA merupakan teknik yang digunakan untuk mencari vektor-vektor basis subruang tiap kelas (grade). Vektor-vektor basis ini akan membangun Principal Component yang akan digunakan dalam pengidentifikasian grade suatu data MRS. Data yang digunakan dalam skripsi ini adalah data yang berasal dari laboratorium radiologi Rumah Sakit Cipto Mangunkusumo (RSCM), Jakarta. Hasil penelitian yang dilakukan pada skripsi ini, diketahui bahwa PCA dapat mengklasifikasi astrocytoma dengan akurasi tertinggi, yaitu 85%. Selain itu, dari penelitian ini dihasilkan perangkat lunak yang dapat digunakan untuk membantu pengambilan keputusan yang terkait dengan klasifikasi astrocytoma menjadi high grade, low grade, dan normal.

MRI gives information in form of brain tissue image, which will be used by MRS proton to determine the concentration of brain metabolites on the astrocytoma diagnosed tissue, such as NAA, choline (Cho), creatine (Cr), Lipid (Lip), Lactate (Lac), Myoinositol (MI), and Glutamine-glutamate (Glx). From that result, astrocytoma could be classified to high grade and low grade. This classifying could be processed manually by pathologist, or be processed statistically. On this essay, astrocytoma would be classified into three class of astrocytoma grades with the Principal Component Analysis (PCA) and Spherical K-Means of the MRS data. Spherical K-Means algorithm is a K-Means algorithm with cosine similarity. At the same time, PCA is a technique which used to find the basis vectors of each class (grade) subspace. These basis vectors would build Principal Component which would be used in identifying a grade of a MRS data. The data used in this essay is resourced from radiology laboratory of Rumah Sakit Cipto Mangunkusumo (RSCM), Jakarta. From this research, note that PCA can classify astrocytoma with the highest accuracy, ie 85%. In addition, this research produce software that can be used to assist decision making related to the classification of astrocytoma to high grade, low grade, and normal"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2010

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Dwianti Westari

PERBANDINGAN PERFORMA METODE NORMALISASI MIN- MAX DAN Z-SCORE PADA TEKNIK K-MEANS UNTUK KLASIFIKASI PASIEN DIABETES = PERFORMANCE COMPARISON OF MIN MAX AND Z-SCORE NORMALIZATION METHOD IN K-MEANS TECHNIQUE FOR DIABETES PATIENT CLASSIFICATION

"Sistem klasifikasi diabetes sangat berguna di bidang kesehatan. Dataset Pima Indian Diabetes (PID) digunakan untuk melatih dan mengevaluasi algoritma ini. Rentang nilai yang tidak seimbang pada atribut mempengaruhi kualitas hasil klasifikasi, sehingga perlu dilakukan preprocess data yang diharapkan dapat meningkatkan akurasi dari dataset hasil klasifikasi PID. Dua jenis metode yang digunakan yaitu normalisasi min-max dan normalisasi z-score. Kedua metode normalisasi ini digunakan dan akurasi klasifikasi dibandingkan. Sebelum dilakukan proses klasifikasi data, data dibagi menjadi data latih dan data uji. Hasil pengujian klasifikasi menggunakan algoritma K-Means menunjukkan bahwa akurasi terbaik terletak pada dataset PID yang telah dinormalisasi menggunakan metode normalisasi min-max, yaitu 79% dibandingkan dengan normalisasi z-score.

The diabetes classification system is very useful in the health sector.. The Pima Indian Diabetes (PID) dataset is used to train and evaluate this algorithm. The unbalanced value range in the attributes affects the quality of the classification result, so it is necessary to preprocess the data which is expected to improve the accuracy of the PID dataset classification result. Two types methods are used that are min-max normalization and z-score normalization. These two normalization methods are used and the classification accuracies are compared. Before the data classification process is carried out, the data is divided into training data and test data. The result of the classification test using the K-Means algorithm has shown that the best accuracy lies in the PID dataset which has been normalized using the min-max normalization method, which 79% compared to z-score normalization"

Depok: Fakultas Teknik Universitas Indonesia, 2020

T-Pdf

UI - Tesis Membership Universitas Indonesia Library

Wu, Junjie

Advances in K-means clustering: a data mining thinking

"This book addresses these challenges and makes novel contributions in establishing theoretical frameworks for K-means distances and K-means based consensus clustering, identifying the "dangerous" uniform effect and zero-value dilemma of K-means, adapting right measures for cluster validity, and integrating K-means with SVMs for rare class analysis. This book not only enriches the clustering and optimization theories, but also provides good guidance for the practical use of K-means, especially for important tasks such as network intrusion detection and credit fraud prediction. The thesis on which this book is based has won the "2010 National Excellent Doctoral Dissertation Award", the highest honor for not more than 100 PhD theses per year in China."

Berlin: Springer-Verlag, 2012

e204063793

eBooks Universitas Indonesia Library

Julizar Isya Pandu Wangsa

Studi Perbandingan Metode Clustering K-Means, DBSCAN, dan HDBSCAN pada BERTopic untuk Pendeteksian Topik = Comparative Study of K-Means, DBSCAN, and HDBSCAN Clustering Methods on BERTopic for Topic Detection

"Pendeteksian topik merupakan suatu proses pengidentifikasian suatu tema sentral yang ada dalam kumpulan dokumen yang luas dan tidak terorganisir. Hal ini merupakan hal sederhana yang bisa dilakukan secara manual jika data yang ada hanya sedikit. Untuk data yang banyak dibutuhkan pengolahan yang tepat agar representasi topik dari setiap dokumen didapat dengan cepat dan akurat sehingga machine learning diperlukan. BERTopic adalah metode pemodelan topik yang memanfaatkan teknik clustering dengan menggunakan model pre-trained Bidirectional Encoder Representations from Transformers (BERT) untuk melakukan representasi teks dan Class based Term Frequency Invers Document Frequency (c-TF-IDF) untuk ekstraksi topik. Metode clustering yang digunakan pada penelitian ini adalah metode K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), dan Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). BERT dipilih sebagai metode representasi teks pada penelitian ini karena BERT merepresentasikan suatu kalimat berdasarkan sequence-of-word dan telah memperhatikan aspek kontekstual kata tersebut dalam kalimat. Hasil representasi teks merupakan vektor numerik dengan dimensi yang besar sehingga perlu dilakukan reduksi dimensi menggunakan Uniform Manifold Approximation and Projection (UMAP) sebelum clustering dilakukan. Model BERTopic dengan tiga metode clustering ini akan dianalisis kinerjanya berdasarkan matrik nilai coherence, diversity, dan quality score. Nilai quality score merupakan perkalian dari nilai coherence dengan nilai diversity. Hasil simulasi yang didapat adalah model BERTopic menggunakan metode clustering K-Means lebih unggul 2 dari 3 dataset untuk nilai quality score dari kedua metode clustering yang ada.

Topic detection is the process of identifying a central theme in a large, unorganized collection of documents. This is a simple thing that can be done manually if there is only a small amount of data. For large amounts of data, proper processing is needed to represent the topic of each document quickly and accurately, so machine learning is required. BERTopic is a topic modeling method that utilizes clustering techniques by using pre-trained Bidirectional Encoder Representations from Transformers (BERT) models to perform text representation and Class based Term Frequency Inverse Document Frequency (c-TF-IDF) for topic extraction. The clustering methods used in this research are the K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). BERT was chosen as the text representation method in this research because BERT represents a sentence based on sequence-of-words and has considered the contextual aspects of the word in the sentence. The result of text representation is a numeric vector with large dimensions, so it is necessary to reduce the dimensions using Uniform Manifold Approximation and Projection (UMAP) before clustering is done. The BERTopic model with three clustering methods will be analyzed for performance based on the matrix of coherence, diversity, and quality score values. The quality score value is the multiplication of the coherence value with the diversity value. The simulation results obtained are the BERTopic model using K-Means clustering method is superior to 2 of the 3 datasets for the quality score value of the two existing clustering methods."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Dwie Putri Donnaro

Penggunaan Algoritma Clustering K-Means, DBScan, LDA, dan Kombinasi K-Means dengan DBScan untuk Menentukan Trending Topic pada Media Sosial X = Use of K-Means Clustering, DBScan, LDA, and Combination of K-Means with DBScan to Determine Trending Topic on Social Media X

"Masyarakat Indonesia sangat sering menggunakan media sosial twitter dan sekarang lebih dikenal dengan X untuk berbagi foto, video atau membuat tweet tentang topic yang sedang trend. Namun tidak banyak dari masyarakat Indonesia yang memanfaatkan trending topic ini untuk membuat konten dalam memasarkan produk barunya. Pada penelitian ini telah dilakukan pengelompokkan trending topic dengan menggunakan 3 algoritma clustering yaitu K-Means, DBScan dan LDA dengan menggunakan 2 kondisi yaitu Menggunakan Kata Kunci dan Tanpa Menggnakan kata Kunci, untuk kategori cluster telah ditentukan yaitu Cluster Politik, Cluster Ekonomi dan Cluster Pendidikan. Hasil penelitian ini adalah K-Means dengan menggunakan kata kunci lebih baik dari pada semuanya yaitu dengan nilai validitas 0,5810 sedangkan diposisi kedua yang termasuk baik adalah DBScan menggunakan kata kunci dengan nilai validitas 0,4656. Oleh karena itu karena hasilnya masih dalam tingkatan 2 yaitu struktur cluster masih dalam kategori baik, maka peneliti melakukan kombinasi antara K-Means dan DBScan dengan menggunakan kata kunci. Dan hasilnya struktur yang terbentuk masuk dalam tingkatan 1 yaitu dalam kategori kuat, nilai validitas yang dihasilkan yaitu 0,7864, sehingga antar trending topic dalam masing-masing cluster memiliki keterkaitan.

Indonesians very often use social media twitter and now better known as X to share photos, videos or make tweets about trending topics. However, not many Indonesians utilize this trending topic to create content to market their new products. In this study, clustering of trending topics has been carried out using 3 clustering algorithms namely K-Means, DBScan and LDA using 2 conditions namely Using Keywords and Without Using Keywords, for cluster categories have been determined namely Political Cluster, Economic Cluster and Education Cluster. The results of this study are K-Means using keywords is better than all of them with a validity value of 0.5810 while in second place which is good is DBScan using keywords with a validity value of 0.4656. Therefore, because the results are still in level 2, namely the cluster structure is still in the good category, the researchers conducted a combination of K-Means and DBScan using keywords. And the result is that the structure formed is in level 1, which is in the strong category, the resulting validity value is 0.7864, so that between trending topics in each cluster have a relationship."

Depok: Fakultas Teknik Universitas Indonesia, 2024

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Nova Yuniarti

Penerapan algoritma K- Means clustering pada pengelompokan barisan DNA virus hepatitis B (HBV) = Application of K-Means algorithm in clustering the DNA sequences of hepatitis B virus (HBV) / Nova Yuniarti

"[ABSTRAK

Berdasarkan data WHO tahun 2014, diperkirakan sekitar 15 juta orang di dunia

yang terinfeksi hepatitis B (HBsAg+) juga terinfeksi hepatitis D. Infeksi hepatitis

D dapat terjadi bersamaan (koinfeksi) atau setelah seseorang terkena hepatitis B

kronis (superinfeksi). Penyakit hepatitis B disebabkan oleh virus HBV dan

penyakit hepatitis D disebabkan oleh virus HDV. HDV tidak dapat hidup tanpa

HBV. Hepatitis D erat hubungannya dengan infeksi virus HBV, sehingga sangat

realistis bila setiap usaha pencegahan terhadap hepatitis B, maka secara tidak

langsung mencegah hepatitis D. Pada tesis ini akan dibahas bagaimana hasil

pengelompokan barisan DNA HBV menggunakan algoritma k-means clustering

dengan menggunakan perangkat lunak R. Dimulai dengan mengumpulkan barisan

DNA HBV yang diambil dari GenBank, kemudian dilakukan ekstraksi ciri

menggunakan n-mers frequency, dan hasil ekstraksi ciri barisan DNA tersebut

dikumpulkan dalam sebuah matriks dan dilakukan normalisasi menggunakan

normalisasi min-max dengan interval [0, 1] yang akan digunakan sebagai data

masukan. Jumlah cluster yang dipilih dalam penelitian ini adalah dua dan

penentuan centroid awal dilakukan secara acak. Pada setiap iterasi dihitung jarak

masing-masing objek ke masing-masing centroid dengan menggunakan Euclidean

distance dan dipilih jarak terpendek untuk menentukan keanggotaan objek di

suatu cluster sampai akhirnya terbentuk dua cluster yang konvergen. Hasil yang

diperoleh adalah virus HBV yang berada pada cluster pertama lebih ganas

dibanding virus HBV yang berada pada cluster kedua, sehingga virus HBV pada

cluster pertama berpotensi berevolusi dengan virus HDV menjadi penyebab

penyakit hepatitis D.

ABSTRACT

Based on WHO data, an estimated of 15 millions people worldwide who are

infected by hepatitis B (HBsAg+) are also infected by hepatitis D. Hepatitis D

infection can occur simultaneously with hepatitis B (co infection) or after a person

is exposed to chronic hepatitis B (super infection). Hepatitis B is caused by the

HBV virus and hepatitis D is caused by HDV virus. HDV can not live without

HBV. Hepatitis D virus is closely related to HBV infection, hence it is really

realistic that every effort of prevention against hepatitis B can indirectly prevent

hepatitis D. This thesis discussed the clustering of HBV DNA sequences by using

k-means clustering algorithm and R programming. Clustering processes is started

with collecting HBV DNA sequences that are taken from GenBank, then

performing extraction HBV DNA sequences using n-mers frequency and

furthermore the extraction results are collected as a matrix and normalized using

the min-max normalization with interval [0, 1] which will later be used as an input

data. The number of clusters is two and the initial centroid selected of cluster is

choosed randomly. In each iteration, the distance of every object to each centroid

are calculated using the Euclidean distance and the minimum distance are selected

to determine the membership in a cluster until two convergent clusters are created.

As the result, the HBV viruses in the first cluster is more virulent than the HBV

viruses in the second cluster, so the HBV viruses in the first cluster can potentially

evolve with HDV viruses that cause hepatitis D., Based on WHO data, an estimated of 15 millions people worldwide who are