Dr. Gürol Canbek

Discover how data quality can make or break AI systems – dive into the ‘Garbage In, Garbage Out’ dilemma in my latest video podcast episode!

Listen now

Dr. Gürol Canbek

Welcome

Dr. Gürol Canbek is a seasoned computer engineering professional with over 25 years of active experience in leading and executing numerous software projects across major organizations such as HAVELSAN, ASELSAN, Takasbank, and the Ministry of National Defense. Throughout his career, he has taken on multiple roles, demonstrating versatility and expertise in complex technological environments.

In addition to his extensive industry experience, Dr. Canbek has made significant contributions to the field of machine learning and classification. His academic research focuses on advancing and benchmarking robust performance metrics, emphasizing the critical importance of dataset quality and understanding the statistical distribution of features before feature selection and model development. Dr. Canbek has distinguished between performance measures and metrics, introduced a new category called "performance indicators," and provided a comprehensive review of binary classification performance measures. He has also developed a research and educational tool, TasKar, for calculating and visualizing these instruments. His work aims to establish a systematic approach to performance evaluation, ensuring unbiased results.

Moreover, Dr. Canbek's recent work highlights the "Garbage In, Garbage Out" (GIGO) rationale, underscoring the importance of ensuring dataset quality in artificial intelligence applications to achieve high and generalizable performance. He has proposed a technique to quantify datasets based on feature frequency distribution characteristics, providing unique insights into feature prevalence. His research in this area, demonstrated through the analysis of Android mobile malware datasets, reveals critical differences in statistical distributions, offering a method to assess dataset sufficiency before feature selection and model building. Additionally, he developed a systematic dataset profiling approach to distinguish datasets collected from different sources, identifying their strengths and weaknesses.

These contributions address two key concerns in modern AI development: performance evaluation and dataset quality/profiling.

My latest publications

1. PToPI

Springer SN Computer Science: Performance Instruments published my most recent research, which proposed a Periodic Table of Performance Instruments: "PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics" [September 2022]

2. ACCBAR (Accuracy Barrier)

In this article, I propose a new performance instrument classification category called 'indicators,' and I provide the first performance indicator instance called ACCBAR (Accuracy Barrier), which indicates problematic cases of widely used performance metrics (Accuracy) Accepted Paper [September 2023]

3. BenchMetrics

My research on benchmarking classification performance metrics was published in the journal Neural Computing and Applications by SpringerNature (SCI, Q1). It suggests that MCC be used to evaluate classification performance: "BenchMetrics: A systematic benchmarking method for binary-classification performance metrics" [August 2021]

4. GIGO

My research on gaining insights in datasets in the shade of “garbage in, garbage out” rationale: feature space distribution fitting was published in WIREs Data Mining and Knowledge Discovery (SCI, Q1). Full Text [March 2022]

5. TasKar: A machine learning research & education tool

An IEEE conference published my paper proposing a representation method and a compact AI tool with graphics for performance evaluation instruments. Download TasKar for free. Watch the video presentation on my YouTube channel
[December 2021]

6. A systematic ML process proposal

This paper, which was presented at an IEEE conference, proposes a systematic ML process that is designed as a cycle with eight sub-processes that traverse introduced spaces (file, sample, class, feature, dataset, model, and finally metric spaces). The sub-process of dataset quality analysis/comparison is designed specifically as a quality control gateway. A case study of the Android mobile malware classification problem domain is used to explain the proposed process. [December 2021]

7. New insight in ML datasets

My article introducing a new method to compare machine-learning datasets via multiple binary-feature frequency ranks has been published in Hittite Journal of Science and Engineering. Free full-text is avilable here (click PDF in the page). [June 2021]