Welcome to ML project page!

Last updated: 2024-12-08 21:04:23

This website serves content about the Machine Learning course project. In this project we will follow several stages:

Data annotation
Feature extraction
Model building and training
Competition

Data annotation

We now begin the first and one of the most important stages of our machine learning course project. Each of you has the task of annotating a portion of the dataset that will be provided as training data. The performance of your models will be as good as the quality of the data you will annotate, so please pay special attention to this stage.

I have developed a system for you to get your annotation files. I have created an assignment for data annotation (called Project-Annotation) and there are already feedback for this assignment. Please use the special code provided to you in the feedback section (similar to CS412-ac7ba96d49fd). You will also use this code to access your personalized reports. So please make sure you can locate it.

Please use the following link to download your data from ' http://www.onurvarol.com/CS412_2025_InstaInfluencers/data/tasks_ CODE-HERE .csv '. Replace CODE-HERE with your personal code provided to you. In this file you will find 150 links for your annotation tasks. Each of the links included in the task file will take you to a Google survey where you will need to follow the instructions there. It is important that you complete all annotation tasks to successfully complete this task and continue for the next stages of the project. Since Instagram applies a strong policy for scraping, please annotate about 25 accounts each day. You will also need an Instagram account to access profile information.

Your grade and the success of your models will depend on your performance in this phase, so please allow adequate time and concentration for this phase.

Annotation Statistics

We can study how much does it take to annotate one instance. Below you can see the distriobution of seconds it takes for annotator to process one instance.

We can analyze the progress of different annotators and how many instances they annotated for each category.

Score distribution for the annotation task shown below. I consider number of annotation completed (70%) and the mean accuracy of the annotations (30%) the components for final scores.

Feature extraction and modelling

You will be provided a sample pipeline for feature extraction and modelling when we release the information for the first round. Our competition continues with model training and evaluation. We are proving raw data for feature extraction and labeled dataset for initial training. In the first round you will be building models and submitting your predictions as text files.

Please first download your own annotations following the link created for you. Use the following link to download your data from ' http://www.onurvarol.com/CS412_2025_InstaInfluencers/reports/report_ CODE-HERE .html '. Replace CODE-HERE with your personal code provided to you. (Will be available once annotation task completed!)

You can access to additional training data following the link below.
LINK FOR FILES
training-dataset.jsonl.gz: Training dataset. It contains user profiles and sample posts data
train-classification.csv: Training data labels. User name and category labels
test-classification-round*.dat: User names of accounts to be classified
test-regression-round*.jsonl: Post of accounts to be classified

At the end of each round, we expect two json files. You MUST name them as shown below. * symbol indicates round count (1, 2, or 3)
prediction-classification-round*.json: Json object where user name is key and predicted label is the value.
prediction-regression-round*.json: Json object where key is the post id and value is the predicted like count.

You can also start from the below notebook to implement your approach and follow the instructions there.
LINK FOR NOTEBOOK (WILL BE ANNOUNCED SOON!)

Competition

We will run this competition in 3 rounds. Details of the competition can be found below.

R#	Status	Start date	End date	Results
1	COMPLETED	16/12/2024	25/12/2024	RESULTS PAGE
2	COMPLETED	26/12/2024	05/01/2025	RESULTS PAGE
3	ACTIVE	06/01/2025	10/01/2025	RESULTS PAGE