Subclustering is the process of breaking existing group/clusters of data into smaller ones based on their similarities, trends and existing patterns. For instance a cluster on
water leakage can be broken down into sub-clusters such as
leakage in balcony,
leakage in bathroom,
leakage in the kitchen, etc.
Why subclustering can be beneficial?
Subclustering helps analyse data in a deeper and more fine-grained way. An example would be having a main cluster on all insurance claims and breaking it down into claims involving cars, houses, fire, flood, ect. Such a break-down can make data analysis much easier and more insightful.
Relevance AI's platform provides you with a no-code workflow to subcluster your clustered data with a few clicks.
Once you have uploaded, vectorized and clustered your data, select your dataset and click on Subcluster under Workflows and follow the guide. The images below show how to subcluster a dataset based on the already existing 10 clusters using the Kmeans algorithm. This setup will break each cluster into two clusters.
Each section in the setup is activated by clicking on the small blue dot on the left-hand side or by following the process which starts with "Get started", filling the data and clicking on "Continue".
- Which cluster would you like to drill deeper into? Select your desired existing clustering field from the menu.
- Which field do you want your subclusters based on? Select the vector field based on which subclustering must be done.
- Let's name your subcluster: Type in a name for the subclustering results.
- What kind of subclusters do you want to see? Select your desired clustering algorithm form the drop-down menu.
- How do you want your data clustered? Enter a number indicating to how many subclusters each cluster should be broken.
Running subclustering, the workflow gets started and results will be added to your dataset. Check the results under the Dataset -> Monitor -> Clusters.
Updated about 1 month ago