Developer Insights

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Practical Tutorial on Random Forest and Parameter Tuning in R

Introduction

Treat "forests" well. Not for the sake of nature, but for solving problems too!

Random Forest is one of the most versatile machine learning algorithms available today. With its built-in ensembling capacity, the task of building a decent generalized model (on any dataset) gets much easier. However, I've seen people using random forest as a black box model; i.e., they don't understand what's happening beneath the code. They just code.

In fact, the easiest part of machine learning is coding. If you are new to machine learning, the random forest algorithm should be on your tips. Its ability to solve—both regression and classification problems along with robustness to correlated features and variable importance plot gives us enough head start to solve various problems.

Most often, I've seen people getting confused in bagging and random forest. Do you know the difference?

In this article, I'll explain the complete concept of random forest and bagging. For ease of understanding, I've kept the explanation simple yet enriching. I've used MLR, data.table packages to implement bagging, and random forest with parameter tuning in R. Also, you'll learn the techniques I've used to improve model accuracy from ~82% to 86%.

Table of Contents

  1. What is the Random Forest algorithm?
  2. How does it work? (Decision Tree, Random Forest)
  3. What is the difference between Bagging and Random Forest?
  4. Advantages and Disadvantages of Random Forest
  5. Solving a Problem
    • Parameter Tuning in Random Forest

What is the Random Forest algorithm?

Random forest is a tree-based algorithm which involves building several trees (decision trees), then combining their output to improve generalization ability of the model. The method of combining trees is known as an ensemble method. Ensembling is nothing but a combination of weak learners (individual trees) to produce a strong learner.

Say, you want to watch a movie. But you are uncertain of its reviews. You ask 10 people who have watched the movie. 8 of them said "the movie is fantastic." Since the majority is in favor, you decide to watch the movie. This is how we use ensemble techniques in our daily life too.

Random Forest can be used to solve regression and classification problems. In regression problems, the dependent variable is continuous. In classification problems, the dependent variable is categorical.

Trivia: The random Forest algorithm was created by Leo Breiman and Adele Cutler in 2001.

How does it work? (Decision Tree, Random Forest)

To understand the working of a random forest, it's crucial that you understand a tree. A tree works in the following way:

decision tree explaining

1. Given a data frame (n x p), a tree stratifies or partitions the data based on rules (if-else). Yes, a tree creates rules. These rules divide the data set into distinct and non-overlapping regions. These rules are determined by a variable's contribution to the homogeneity or pureness of the resultant child nodes (X2, X3).

2. In the image above, the variable X1 resulted in highest homogeneity in child nodes, hence it became the root node. A variable at root node is also seen as the most important variable in the data set.

3. But how is this homogeneity or pureness determined? In other words, how does the tree decide at which variable to split?

  • In regression trees (where the output is predicted using the mean of observations in the terminal nodes), the splitting decision is based on minimizing RSS. The variable which leads to the greatest possible reduction in RSS is chosen as the root node. The tree splitting takes a top-down greedy approach, also known as recursive binary splitting. We call it "greedy" because the algorithm cares to make the best split at the current step rather than saving a split for better results on future nodes.
  • In classification trees (where the output is predicted using mode of observations in the terminal nodes), the splitting decision is based on the following methods:
    • Gini Index - It's a measure of node purity. If the Gini index takes on a smaller value, it suggests that the node is pure. For a split to take place, the Gini index for a child node should be less than that for the parent node.
    • Entropy - Entropy is a measure of node impurity. For a binary class (a, b), the formula to calculate it is shown below. Entropy is maximum at p = 0.5. For p(X=a)=0.5 or p(X=b)=0.5 means a new observation has a 50%-50% chance of getting classified in either class. The entropy is minimum when the probability is 0 or 1.

Entropy = - p(a)*log(p(a)) - p(b)*log(p(b))

entropy curve

In a nutshell, every tree attempts to create rules in such a way that the resultant terminal nodes could be as pure as possible. Higher the purity, lesser the uncertainty to make the decision.

But a decision tree suffers from high variance. "High Variance" means getting high prediction error on unseen data. We can overcome the variance problem by using more data for training. But since the data set available is limited to us, we can use resampling techniques like bagging and random forest to generate more data.

Building many decision trees results in a forest. A random forest works the following way:

  1. First, it uses the Bagging (Bootstrap Aggregating) algorithm to create random samples. Given a data set D1 (n rows and p columns), it creates a new dataset (D2) by sampling n cases at random with replacement from the original data. About 1/3 of the rows from D1 are left out, known as Out of Bag (OOB) samples.
  2. Then, the model trains on D2. OOB sample is used to determine unbiased estimate of the error.
  3. Out of p columns, P ≪ p columns are selected at each node in the data set. The P columns are selected at random. Usually, the default choice of P is p/3 for regression tree and √p for classification tree.
  4. pruning decision trees Unlike a tree, no pruning takes place in random forest; i.e., each tree is grown fully. In decision trees, pruning is a method to avoid overfitting. Pruning means selecting a subtree that leads to the lowest test error rate. We can use cross-validation to determine the test error rate of a subtree.
  5. Several trees are grown and the final prediction is obtained by averaging (for regression) or majority voting (for classification).

Each tree is grown on a different sample of original data. Since random forest has the feature to calculate OOB error internally, cross-validation doesn't make much sense in random forest.

What is the difference between Bagging and Random Forest?

Many a time, we fail to ascertain that bagging is not the same as random forest. To understand the difference, let's see how bagging works:

  1. It creates randomized samples of the dataset (just like random forest) and grows trees on a different sample of the original data. The remaining 1/3 of the sample is used to estimate unbiased OOB error.
  2. It considers all the features at a node (for splitting).
  3. Once the trees are fully grown, it uses averaging or voting to combine the resultant predictions.

Aren't you thinking, "If both the algorithms do the same thing, what is the need for random forest? Couldn't we have accomplished our task with bagging?" NO!

The need for random forest surfaced after discovering that the bagging algorithm results in correlated trees when faced with a dataset having strong predictors. Unfortunately, averaging several highly correlated trees doesn't lead to a large reduction in variance.

But how do correlated trees emerge? Good question! Let's say a dataset has a very strong predictor, along with other moderately strong predictors. In bagging, a tree grown every time would consider the very strong predictor at its root node, thereby resulting in trees similar to each other.

The main difference between random forest and bagging is that random forest considers only a subset of predictors at a split. This results in trees with different predictors at the top split, thereby resulting in decorrelated trees and more reliable average output. That's why we say random forest is robust to correlated predictors.

Advantages and Disadvantages of Random Forest

Advantages are as follows:

  1. It is robust to correlated predictors.
  2. It is used to solve both regression and classification problems.
  3. It can also be used to solve unsupervised ML problems.
  4. It can handle thousands of input variables without variable selection.
  5. It can be used as a feature selection tool using its variable importance plot.
  6. It takes care of missing data internally in an effective manner.

Disadvantages are as follows:

  1. The Random Forest model is difficult to interpret.
  2. It tends to return erratic predictions for observations out of the range of training data. For example, if the training data contains a variable x ranging from 30 to 70, and the test data has x = 200, random forest would give an unreliable prediction.
  3. It can take longer than expected to compute a large number of trees.

Solving a Problem (Parameter Tuning)

Let's take a dataset to compare the performance of bagging and random forest algorithms. Along the way, I'll also explain important parameters used for parameter tuning. In R, we'll use MLR and data.table packages to do this analysis.

I've taken the Adult dataset from the UCI machine learning repository. You can download the data from here.

This dataset presents a binary classification problem to solve. Given a set of features, we need to predict if a person's salary is <=50K or >=50K. Since the given data isn't well structured, we'll need to make some modification while reading the dataset.

# set working directory
path <- "~/December 2016/RF_Tutorial"
setwd(path)
# Set working directory
path <- "~/December 2016/RF_Tutorial"
setwd(path)

# Load libraries
library(data.table)
library(mlr)
library(h2o)

# Set variable names
setcol <- c("age",
            "workclass",
            "fnlwgt",
            "education",
            "education-num",
            "marital-status",
            "occupation",
            "relationship",
            "race",
            "sex",
            "capital-gain",
            "capital-loss",
            "hours-per-week",
            "native-country",
            "target")

# Load data
train <- read.table("adultdata.txt", header = FALSE, sep = ",", 
                    col.names = setcol, na.strings = c(" ?"), stringsAsFactors = FALSE)
test <- read.table("adulttest.txt", header = FALSE, sep = ",", 
                   col.names = setcol, skip = 1, na.strings = c(" ?"), stringsAsFactors = FALSE)

After we've loaded the dataset, first we'll set the data class to data.table. data.table is the most powerful R package made for faster data manipulation.


>setDT(train)
>setDT(test)

Now, we'll quickly look at given variables, data dimensions, etc.


>dim(train)
>dim(test)
>str(train)
>str(test)

As seen from the output above, we can derive the following insights:

  1. The train dataset has 32,561 rows and 15 columns.
  2. The test dataset has 16,281 rows and 15 columns.
  3. Variable target is the dependent variable.
  4. The target variable in train and test data is different. We'll need to match them.
  5. All character variables have a leading whitespace which can be removed.

We can check missing values using:

# Check missing values in train and test datasets
>table(is.na(train))
# Output:
#  FALSE   TRUE 
#  484153  4262

>sapply(train, function(x) sum(is.na(x)) / length(x)) * 100

table(is.na(test))
# Output:
#  FALSE  TRUE 
#  242012 2203

>sapply(test, function(x) sum(is.na(x)) / length(x)) * 100

As seen above, both train and test datasets have missing values. The sapply function is quite handy when it comes to performing column computations. Above, it returns the percentage of missing values per column.

Now, we'll preprocess the data to prepare it for training. In R, random forest internally takes care of missing values using mean/mode imputation. Practically speaking, sometimes it takes longer than expected for the model to run.

Therefore, in order to avoid waiting time, let's impute the missing values using median/mode imputation method; i.e., missing values in the integer variables will be imputed with median and in the factor variables with mode (most frequent value).

We'll use the impute function from the mlr package, which is enabled with several unique methods for missing value imputation:

# Impute missing values
>imp1 <- impute(data = train, target = "target", 
              classes = list(integer = imputeMedian(), factor = imputeMode()))

>imp2 <- impute(data = test, target = "target", 
              classes = list(integer = imputeMedian(), factor = imputeMode()))

# Assign the imputed data back to train and test
>train <- imp1$data
>test <- imp2$data

Being a binary classification problem, you are always advised to check if the data is imbalanced or not. We can do it in the following way:

# Check class distribution in train and test datasets
setDT(train)[, .N / nrow(train), target]
# Output:
#    target     V1
# 1: <=50K   0.7591904
# 2: >50K    0.2408096

setDT(test)[, .N / nrow(test), target]
# Output:
#    target     V1
# 1: <=50K.  0.7637737
# 2: >50K.   0.2362263

If you observe carefully, the value of the target variable is different in test and train. For now, we can consider it a typo error and correct all the test values. Also, we see that 75% of people in the train data have income <=50K. Imbalanced classification problems are known to be more skewed with a binary class distribution of 90% to 10%. Now, let's proceed and clean the target column in test data.

# Clean trailing character in test target values
test[, target := substr(target, start = 1, stop = nchar(target) - 1)]

We've used the substr function to return the substring from a specified start and end position. Next, we'll remove the leading whitespaces from all character variables. We'll use the str_trim function from the stringr package.

> library(stringr)
> char_col <- colnames(train)[sapply(train, is.character)]
> for(i in char_col)
>     set(train, j = i, value = str_trim(train[[i]], side = "left"))

Using sapply function, we've extracted the column names which have character class. Then, using a simple for - set loop we traversed all those columns and applied the str_trim function.

Before we start model training, we should convert all character variables to factor. MLR package treats character class as unknown.


> fact_col <- colnames(train)[sapply(train,is.character)]
>for(i in fact_col)
			set(train,j=i,value = factor(train[[i]]))
>for(i in fact_col)
	     set(test,j=i,value = factor(test[[i]]))

Let's start with modeling now. MLR package has its own function to convert data into a task, build learners, and optimize learning algorithms. I suggest you stick to the modeling structure described below for using MLR on any data set.

#create a task
> traintask <- makeClassifTask(data = train,target = "target")
> testtask <- makeClassifTask(data = test,target = "target")

#create learner > bag <- makeLearner("classif.rpart",predict.type = "response") > bag.lrn <- makeBaggingWrapper(learner = bag,bw.iters = 100,bw.replace = TRUE)

I've set up the bagging algorithm which will grow 100 trees on randomized samples of data with replacement. To check the performance, let's set up a validation strategy too:

#set 5 fold cross validation
> rdesc <- makeResampleDesc("CV", iters = 5L)

For faster computation, we'll use parallel computation backend. Make sure your machine / laptop doesn't have many programs running in the background.

#set parallel backend (Windows)
> library(parallelMap)
> library(parallel)
> parallelStartSocket(cpus = detectCores())
>

For linux users, the function parallelStartMulticore(cpus = detectCores()) will activate parallel backend. I've used all the cores here.

r <- resample(learner = bag.lrn,
              task = traintask,
              resampling = rdesc,
              measures = list(tpr, fpr, fnr, fpr, acc),
              show.info = T)

#[Resample] Result: 
# tpr.test.mean = 0.95,
# fnr.test.mean = 0.0505,
# fpr.test.mean = 0.487,
# acc.test.mean = 0.845

Being a binary classification problem, I've used the components of confusion matrix to check the model's accuracy. With 100 trees, bagging has returned an accuracy of 84.5%, which is way better than the baseline accuracy of 75%. Let's now check the performance of random forest.

#make randomForest learner
> rf.lrn <- makeLearner("classif.randomForest")
> rf.lrn$par.vals <- list(ntree = 100L,
                          importance = TRUE)

> r <- resample(learner = rf.lrn,
                task = traintask,
                resampling = rdesc,
                measures = list(tpr, fpr, fnr, fpr, acc),
                show.info = T)

# Result:
# tpr.test.mean = 0.996,
# fpr.test.mean = 0.72,
# fnr.test.mean = 0.0034,
# acc.test.mean = 0.825

On this data set, random forest performs worse than bagging. Both used 100 trees and random forest returns an overall accuracy of 82.5 %. An apparent reason being that this algorithm is messing up classifying the negative class. As you can see, it classified 99.6% of the positive classes correctly, which is way better than the bagging algorithm. But it incorrectly classified 72% of the negative classes.

Internally, random forest uses a cutoff of 0.5; i.e., if a particular unseen observation has a probability higher than 0.5, it will be classified as <=50K. In random forest, we have the option to customize the internal cutoff. As the false positive rate is very high now, we'll increase the cutoff for positive classes (<=50K) and accordingly reduce it for negative classes (>=50K). Then, train the model again.

#set cutoff
> rf.lrn$par.vals <- list(ntree = 100L,
                          importance = TRUE,
                          cutoff = c(0.75, 0.25))

> r <- resample(learner = rf.lrn,
                task = traintask,
                resampling = rdesc,
                measures = list(tpr, fpr, fnr, fpr, acc),
                show.info = T)

#Result: 
# tpr.test.mean = 0.934,
# fpr.test.mean = 0.43,
# fnr.test.mean = 0.0662,
# acc.test.mean = 0.846

As you can see, we've improved the accuracy of the random forest model by 2%, which is slightly higher than that for the bagging model. Now, let's try and make this model better.

Parameter Tuning: Mainly, there are three parameters in the random forest algorithm which you should look at (for tuning):

  • ntree - As the name suggests, the number of trees to grow. Larger the tree, it will be more computationally expensive to build models.
  • mtry - It refers to how many variables we should select at a node split. Also as mentioned above, the default value is p/3 for regression and sqrt(p) for classification. We should always try to avoid using smaller values of mtry to avoid overfitting.
  • nodesize - It refers to how many observations we want in the terminal nodes. This parameter is directly related to tree depth. Higher the number, lower the tree depth. With lower tree depth, the tree might even fail to recognize useful signals from the data.

Let get to the playground and try to improve our model's accuracy further. In MLR package, you can list all tuning parameters a model can support using:

> getParamSet(rf.lrn)

# set parameter space
params <- makeParamSet(
    makeIntegerParam("mtry", lower = 2, upper = 10),
    makeIntegerParam("nodesize", lower = 10, upper = 50)
)

# set validation strategy
rdesc <- makeResampleDesc("CV", iters = 5L)

# set optimization technique
ctrl <- makeTuneControlRandom(maxit = 5L)

# start tuning
> tune <- tuneParams(learner = rf.lrn,
                     task = traintask,
                     resampling = rdesc,
                     measures = list(acc),
                     par.set = params,
                     control = ctrl,
                     show.info = T)

[Tune] Result: mtry=2; nodesize=23 : acc.test.mean=0.858

After tuning, we have achieved an overall accuracy of 85.8%, which is better than our previous random forest model. This way you can tweak your model and improve its accuracy.

I'll leave you here. The complete code for this analysis can be downloaded from Github.

Summary

Don't stop here! There is still a huge scope for improvement in this model. Cross validation accuracy is generally more optimistic than true test accuracy. To make a prediction on the test set, minimal data preprocessing on categorical variables is required. Do it and share your results in the comments below.

My motive to create this tutorial is to get you started using the random forest model and some techniques to improve model accuracy. For better understanding, I suggest you read more on confusion matrix. In this article, I've explained the working of decision trees, random forest, and bagging.

Did I miss out anything? Do share your knowledge and let me know your experience while solving classification problems in comments below.

Exclusive SQL Tutorial on Data Analysis in R

Introduction

Many people are pursuing data science as a career (to become a data scientist) choice these days. With the recent data deluge, companies are voraciously headhunting people who can handle, understand, analyze, and model data.

Be it college graduates or experienced professionals, everyone is busy searching for the best courses or training material to become a data scientist. Some of them even manage to learn Python or R, but still can't land their first analytics job!

What most people fail to understand is that the data science/analytics industry isn't just limited to using Python or R. There are several other coding languages which companies use to run their businesses.

Among all, the most important and widely used language is SQL (Structured Query Language). You must learn it.

I've realized that, as a newbie, learning SQL is somewhat difficult at home. After all, setting up a server enabled database engine isn't everybody's cup of tea. Isn't it? Don't you worry.

In this article, we'll learn all about SQL and how to write its queries.

Note: This article is meant to help R users who wants to learn SQL from scratch. Even if you are new to R, you can still check out this tutorial as the ultimate motive is to learn SQL here.

Table of Contents

  1. Why learn SQL ?
  2. What is SQL?
  3. Getting Started with SQL
    • Data Selection
    • Data Manipulation
    • Strings & Dates
  4. Practising SQL in R
Machine learning challenge, ML challenge

Why learn SQL ?

Good question! When I started learning SQL, I asked this question too. Though, I had no one to answer me. So, I decided to find it out myself.

SQL is the de facto standard programming language used to handle relational databases.

Let's look at the dominance / popularity of SQL in worldwide analytics / data science industry. According to an online survey conducted by Oreilly Media in 2016, it was found that among all the programming languages, SQL was used by 70% of the respondents followed by R and Python. It was also discovered that people who know Excel (Spreadsheet) tend to get significant salary boost once they learn SQL.

Also, according to a survey done by datasciencecentral, it was inferred that R users tend to get a nice salary boost once they learn SQL. In a way, SQL as a language is meant to complement your current set of skills.

Since 1970, SQL has remained an integral part of popular databases such as Oracle, IBM DB2, Microsoft SQL Server, MySQL, etc. Not only learning SQL with R will increase your employability, but SQL itself can make way for you in database management roles.

What is SQL ?

SQL (Structured Query Language) is a special purpose programming language used to manage, extract, and aggregate data stored in large relational database management systems.

In simple words, think of a large machine (rectangular shape) consisting of many, many boxes (again rectangles). Each box comprises a table (dataset). This is a database. A database is an organized collection of data. Now, this database understands only one language, i.e, SQL. No English, Japanese, or Spanish. Just SQL. Therefore, SQL is a language which interacts with the databases to retrieve data.

Following are some important features of SQL:

  1. It allows us to create, update, retrieve, and delete data from the database.
  2. It works with popular database programs such as Oracle, DB2, SQL Server, etc.
  3. As the databases store humongous amounts of data, SQL is widely known for it speed and efficiency.
  4. It is very simple and easy to learn.
  5. It is enabled with inbuilt string and date functions to execute data-time conversions.

Currently, businesses worldwide use both open source and proprietary relational database management systems (RDBMS) built around SQL.

Getting Started with SQL

Let's try to understand SQL commands now. Most of these commands are extremely easy to pick up as they are simple "English words." But make sure you get a proper understanding of their meanings and usage in SQL context. For your ease of understanding, I've categorized the SQL commands in three sections:

  1. Data Selection - These are SQL's indigenous commands used to retrieve tables from databases supported by logical statements.
  2. Data Manipulation - These commands would allow you to join and generate insights from data.
  3. Strings and Dates - These special commands would allow you to work diligently with dates and string variables.

Before we start, you must know that SQL functions recognize majorly four data types. These are:

  1. Integers - This datatype is assigned to variables storing whole numbers, no decimals. For example, 123,324,90,10,1, etc.
  2. Boolean - This datatype is assigned to variables storing TRUE or FALSE data.
  3. Numeric - This datatype is assigned to variables storing decimal numbers. Internally, it is stored as a double precision. It can store up to 15 -17 significant digits.
  4. Date/Time - This datatype is assigned to variables storing data-time information. Internally, it is stored as a time stamp.

That's all! If SQL finds a variable whose type is anything other than these four, it will throw read errors. For example, if a variable has numbers with a comma (like 432,), you'll get errors. SQL as a language is very particular about the sequence of commands given. If the sequence is not followed, it starts to throw errors. Don't worry I've defined the sequence below. Let's learn the commands. In the following section, we'll learn to use them with a data set.

Data Selection

  1. SELECT - It tells you which columns to select.
  2. FROM - It tells you columns to be selected should be from which table (dataset).
  3. LIMIT - By default, a command is executed on all rows in a table. This command limits the number of rows. Limiting the rows leads to faster execution of commands.
  4. WHERE - This command specifies a filter condition; i.e., the data retrieval has to be done based on some variable filtering.
  5. Comparison Operators - Everyone knows these operators as (=, !=, <, >, <=, >=). They are used in conjunction with the WHERE command.
  6. Logical Operators - The famous logical operators (AND, OR, NOT) are also used to specify multiple filtering conditions. Other operators include:
    • LIKE - It is used to extract similar values and not exact values.
    • IN - It is used to specify the list of values to extract or leave out from a variable.
    • BETWEEN - It activates a condition based on variable(s) in the table.
    • IS NULL - It allows you to extract data without missing values from the specified column.
  7. ORDER BY - It is used to order a variable in descending or ascending order.

Data Manipulation

  1. Aggregate Functions - These functions are helpful in generating quick insights from data sets.
    • COUNT - It counts the number of observations.
    • SUM - It calculates the sum of observations.
    • MIN/MAX - It calculates the min/max and the range of a numerical distribution.
    • AVG - It calculates the average (mean).
  2. GROUP BY - For categorical variables, it calculates the above stats based on their unique levels.
  3. HAVING - Mostly used for strings to specify a particular string or combination while retrieving data.
  4. DISTINCT - It returns the unique number of observations.
  5. CASE - It is used to create rules using if/else conditions.
  6. JOINS - Used to merge individual tables. It can implement:
    • INNER JOIN - Returns the common rows from A and B based on joining criteria.
    • OUTER JOIN - Returns the rows not common to A and B.
    • LEFT JOIN - Returns the rows in A but not in B.
    • RIGHT JOIN - Returns the rows in B but not in A.
    • FULL OUTER JOIN - Returns all rows from both tables, often with NULLs.
  7. ON - Used to specify a column for filtering while joining tables.
  8. UNION - Similar to rbind() in R. Combines two tables with identical variable names.

You can write complex join commands using comparison operators, WHERE, or ON to specify conditions.

sql joins data analysis data science

Strings and Dates

  1. NOW - Returns current time.
  2. LEFT - Returns a specified number of characters from the left in a string.
  3. RIGHT - Returns a specified number of characters from the right in a string.
  4. LENGTH - Returns the length of the string.
  5. TRIM - Removes characters from the beginning and end of the string.
  6. SUBSTR - Extracts part of a string with specified start and end positions.
  7. CONCAT - Combines strings.
  8. UPPER - Converts a string to uppercase.
  9. LOWER - Converts a string to lowercase.
  10. EXTRACT - Extracts date components such as day, month, year, etc.
  11. DATE_TRUNC - Rounds dates to the nearest unit of measurement.
  12. COALESCE - Imputes missing values.

These commands are not case sensitive, but consistency is important. SQL commands follow this standard sequence:

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY
  7. LIMIT

Practising SQL in R

For writing SQL queries, we'll use the sqldf package. It activates SQL in R using SQLite (default) and can be faster than base R for some manipulations. It also supports H2 Java database, PostgreSQL, and MySQL.

You can easily connect database servers using this package and query data. For more details, check the GitHub repo by its author.

When using SQL in R, think of R as the database machine. Load datasets using read.csv or read.csv.sql and start querying. Ready? Let’s begin! Code every line as you scroll. Practice builds confidence.

We'll use the babynames dataset. Install and load it with:

> install.packages("babynames")
> library(babynames)
> str(babynames)

This dataset contains 1.8 million observations and 5 variables. The prop variable is the proportion of a name given in a year. Now, load the sqldf package:

> install.packages("sqldf")
> library(sqldf)

Let’s check the number of rows in this data.

> sqldf("select count(*) from mydata")
#1825433

Ignore the warnings here. Next, let's look at the data — the first 10 rows:

> sqldf("select * from mydata limit 10")

* selects all columns. To select specific variables:

> sqldf("select year, sex, name from mydata limit 10")

To rename a column in the output using AS:

> sqldf("select year, sex as 'Gender' from mydata limit 10")

Filtering data with WHERE and logical conditions:

> sqldf("select year, name, sex as 'Gender' from mydata where sex == 'F' limit 20")
> sqldf("select * from mydata where prop > 0.05 limit 20")
> sqldf("select * from mydata where sex != 'F'")
> sqldf("select year, name, 4 * prop as 'final_prop' from mydata where prop <= 0.40 limit 10")

Ordering data:

> sqldf("select * from mydata order by year desc limit 20")
> sqldf("select * from mydata order by year desc, n desc limit 20")
> sqldf("select * from mydata order by name limit 20")

Filtering with string patterns:

> sqldf("select * from mydata where name like 'Ben%'")
> sqldf("select * from mydata where name like '%man' limit 30")
> sqldf("select * from mydata where name like '%man%'")
> sqldf("select * from mydata where name in ('Coleman','Benjamin','Bennie')")
> sqldf("select * from mydata where year between 2000 and 2014")

Multiple filters with logical operators:

> sqldf("select * from mydata where year >= 1980 and prop < 0.5")
> sqldf("select * from mydata where year >= 1980 and prop < 0.5 order by prop desc")
> sqldf("select * from mydata where name != '%man%' or year > 2000")
> sqldf("select * from mydata where prop > 0.07 and year not between 2000 and 2014")
> sqldf("select * from mydata where n > 10000 order by name desc")

Basic aggregation:

> sqldf("select sum(n) as 'Total_Count' from mydata")
> sqldf("select min(n), max(n) from mydata")
> sqldf("select year, avg(n) as 'Average' from mydata group by year order by Average desc")
> sqldf("select year, count(*) as count from mydata group by year limit 100")
> sqldf("select year, n, count(*) as 'my_count' from mydata where n > 10000 group by year order by my_count desc limit 100")

Using HAVING instead of WHERE for aggregations:

> sqldf("select year, name, sum(n) as 'my_sum' from mydata group by year having my_sum > 10000 order by my_sum desc limit 100")

Counting distinct names:

> sqldf("select count(distinct name) as 'count_names' from mydata")

Creating new columns using CASE (if/else logic):

> sqldf("select year, n, case when year = '2014' then 'Young' else 'Old' end as 'young_or_old' from mydata limit 10")
> sqldf("select *, case when name != '%man%' then 'Not_a_man' when name = 'Ban%' then 'Born_with_Ban' else 'Un_Ban_Man' end as 'Name_Fun' from mydata")

Joining data sets using a key:

> crash <- read.csv.sql("crashes.csv", sql = "select * from file")
> roads <- read.csv.sql("roads.csv", sql = "select * from file")
> sqldf("select * from crash join roads on crash.Road = roads.Road")
> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road")

Joining with aggregation and multiple keys:

> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road order by 1")
> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road where roads.Road != 'US-36' order by 1")
> sqldf("select Road, avg(roads.Length) as 'Avg_Length', avg(N_Crashes) as 'Avg_Crash' from roads join crash using (Road) group by Road")
> roads$Year <- crash$Year[1:5]
> sqldf("select crash.Year, crash.Volume, roads.* from crash left join roads on crash.Road = roads.Road and crash.Year = roads.Year order by 1")

String operations in sqldf with RSQLite extension:

> library(RSQLite)
> help("initExtension")

> sqldf("select name, leftstr(name, 3) as 'First_3' from mydata order by First_3 desc limit 100")
> sqldf("select name, reverse(name) as 'Rev_Name' from mydata limit 100")
> sqldf("select name, rightstr(name, 3) as 'Back_3' from mydata order by First_3 desc limit 100")

Summary

The aim of this article was to help you get started writing queries in SQL using a blend of practical and theoretical explanations. Beyond these queries, SQL also allows you to write subqueries aka nested queries to execute multiple commands in one go. We shall learn about those in future tutorials.

As I said above, learning SQL will not only give you a fatter paycheck but also allow you to seek job profiles other than that of a data scientist. As I always say, SQL is easy to learn but difficult to master. Do practice enough.

In this article, we learned the basics of SQL. We learned about data selection, aggregation, and string manipulation commands in SQL. In addition, we also looked at the industry trend of SQL language to infer if that's the programming language you will promise to learn in your new year resolution. So, will you?

If you get stuck with any query written above, do drop in your suggestions, questions, and feedback in comments below!

Winning the HackerEarth Machine Learning challenge

A 2-day experience at Societe Generale, Bengaluru

Societe Generale, one of the largest banks in France, in collaboration with HackerEarth, organized Brainwaves, the annual hackathon at Bengaluru on November 12–13, 2016. The theme of the hackathon this year was “Machine Learning”.

The hackathon had an online qualifier from where 85 top teams out of 2200 registrations from all over India, were selected for the final round. The final round was a 30-hour long hackathon which needed the teams to solve 1 problem out of 3 given problems spanning across transaction fraud detection, image and text analytics.

I decided to solve the former since I have had experience working with banking data in multiple firms I have previously worked with.

Top 3 teams pose for the customary picture

Brief Approach

For the first problem, we were given millions of historical transactions to find patterns from and use these patterns to find anomalies on future transactions. We quickly skimmed through the data and built our machine learning model to predict the fraud on future transactional data and ranked #1 on the leaderboard.

Eventually, we also built dashboards which can be used for proactive real-time monitoring for detection of any kind of new anomalies, or they can also be used to monitor transaction throughput etc.

You could think of it like a one-stop control center with a global view of what’s going through the system. One commonly known fraudulent behaviour is, fraudsters try to exploit the system by doing high number of small debits and one large credit, thus swindling the money across countries and exchanges and hence trying to circumvent the defences of the system.

This particular kind was quite challenging to incorporate into our machine learning model on and we are glad to have solved it to a good extent in 30 hours. Eventually, we had a good dashboard, a very good model and made an excellent pitch to the jury and ranked 1st amongst 85 teams.

Machine learning challenge, ML challenge

Experiences at the hackathon

The hackathon was very well organized in terms of the quality of problem statements in the online and offline rounds, the way the organizing team responded to any queries. It was genuinely surprising to see many mentors walking down to our table, talking to us about our backgrounds and providing us various domain related insights which augmented our model and resulted in higher performance.

Our team with the amazing mentors

Even during the late hours none of them really left the place, they would always come and check if we were stalled anywhere and help us directionally so that we make constant progress. Having participated in a lot of hackathons prior to this one, I am very surprised by the energy levels of the mentors at Societe Generale.

To conclude, I would like to thank HackerEarth, Societe Generale, mentors and most importantly Phani Srinath and Supreeth Manyam for their fantastic work during the weekend. Great work guys! If not for any of the above, I am sure that weekend wouldn’t have been so memorable.

Our team with Societe Generale India CEO

And yes… we partied long and hard that night!

This post was originally published here.

Descriptive statistics with Python-NumPy

Is it gonna rain today? Should I take my umbrella to the office or not? To know the answer to such questions we will just take out our phone and check the weather forecast. How is this done? There are computer models which use statistics to compare weather conditions from the past with the current conditions to predict future weather conditions. From studying the amount of fluoride that is safe in our toothpaste to predicting the future stock rates, everything requires statistics. Data is everything in statistics. Calculating the range, median, and mode of the data set is all a part of descriptive statistics.

Data representation, manipulation, and visualization are key components in statistics. You can read about it here.

The next important step is analyzing the data, which can be done using both descriptive and inferential statistics. Both descriptive and inferential statistics are used to analyze results and draw conclusions in most of the research studies conducted on groups of people.

Through this article, we will learn descriptive statistics using Python.

Machine learning challenge, ML challenge

Introduction

Descriptive statistics describe the basic and important features of data. Descriptive statistics help simplify and summarize large amounts of data in a sensible manner. For instance, consider the Cumulative Grade Point Index (CGPI), which is used to describe the general performance of a student across a wide range of course experiences.

Descriptive statistics involve evaluating measures of center (centrality measures) and measures of dispersion (spread).

descriptive statistics

Centrality measures

Centrality measures give us an estimate of the center of a distribution. It gives us a sense of a typical value we would expect to see. The three major measures of center include the mean, median, and mode.

Machine Learning and Auto-Evaluation

Machine Learning

In very simple terms, Machine Learning is about training or teaching computers to take decisions or actions without explicitly programming them. For example, whenever you read a tweet or movie review, you can figure out if the views expressed are positive or negative. But can you teach a computer to determine the sentiment of that text? This has many real-life applications. For instance, when Donald Trump makes a speech, Twitter responds with a range of sentiments, and his campaign team can assess the overall sentiment using machine learning.

Another example: Baidu predicted that Germany would win the 2014 World Cup even before the match was played.

Weather Problem

Consider this small dataset of favorable weather conditions for playing a game. The goal is to forecast whether one can play the game based on the given conditions.

Outlook Temperature Humidity Windy Play
Sunny Hot High False No
Rainy Mild High False Yes
Sunny Cool Normal False Yes

Definitions

Feature/Attribute: Outlook, Temperature, Humidity, and Windy are features or attributes that influence the outcome.

Outcome/Target: The result to be predicted, i.e., whether you can play or not.

Vector: A row in the dataset representing an ordered collection of features (e.g., Sunny, Hot, High, False).

ML Model: The algorithm or process generated from the learning process (e.g., Decision Trees, SVM, Naive Bayes).

Error Metric/Evaluation Metric: Used to assess the accuracy of an ML model’s predictions. Different types exist for different problems.

Supporting ML Problems on HackerEarth

HackerEarth’s ML platform supports a typical machine learning flow. A dataset is split into training and test sets. Users train their models on the training set and predict outcomes on the test set. The test set does not include the target variable.

Example Dataset

Outlook Temperature Humidity Windy Play
SunnyHotHighFalseNo
RainyMildHighFalseYes
SunnyCoolNormalFalseYes
OvercastHotHighFalseYes
RainyMildHighFalseYes
OvercastHotNormalFalseYes
SunnyMildNormalTrueYes
SunnyMildHighFalseNo
OvercastCoolNormalTrueYes
RainyMildHighTrueYes

Train Dataset (train.csv)

Outlook Temperature Humidity Windy Play
SunnyHotHighFalseNo
RainyMildHighFalseYes
SunnyCoolNormalFalseYes
OvercastHotHighFalseYes
RainyMildHighFalseYes
OvercastHotNormalFalseYes

Test Dataset (test.csv)

Id Outlook Temperature Humidity Windy
1SunnyMildNormalTrue
2SunnyMildHighFalse
3OvercastCoolNormalTrue
4RainyMildHighTrue

Notice the absence of the target variable in the test data.

User Prediction File (user_prediction.csv)

Id Play
1Yes
2Yes
3No
4No

Correct Prediction File (correct_prediction.csv)

Id Play
1Yes
2No
3Yes
4Yes

Evaluation Metric

During the contest, only 50% of the test dataset is used for evaluation to discourage overfitting. The evaluation metric is defined as:

Score = Number of correct predictions / Total rows

In this case, only ID 1 is predicted correctly out of the first two, so:

Score online = 1 / 2 = 0.5

After the contest, the model is evaluated on the full test dataset:

Score offline = 1 / 4 = 0.25

This demonstrates how overfitting can reduce real-world model performance. Online evaluations using partial data help encourage more generalizable solutions.

7 open source IoT operating systems that are democratizing the IoT space

The power of Open Source is the power of the people. The people rule

– Philippe Kahn (the creator of world’s first camera phone)

The open source movement, which started in 1998 as a niche, went on to become mainstream after the success of projects like Linux, Ubuntu, MySQL, Apache, etc. Later, big companies like IBM, Microsoft, and Apple began adopting the open-source software development model.

Today, companies in the IoT space like Samsung, Google, Huawei, and ARM are embracing open source by opening their hardware and software projects to the developer community and inviting contributions to build robust and reliable technology.

Here are 7 open-source operating systems for IoT devices that power a wide range of smart devices, from wearables to autonomous vehicles.

Note: All listed operating systems have two key traits:

  • Low memory footprint
  • High power efficiency

Brillo

Google's Android-based OS for embedded devices, capable of running on devices with at least 128MB ROM and 32MB RAM.

Supported communication protocols:

  • Wi-Fi
  • Bluetooth
  • Thread

Brillo supports secure boot, OTA updates, and architectures including ARM, Intel, and MIPS.

Contiki

Created in 2002 by Adam Dunkels, this BSD-licensed OS includes a TCP/IP stack and multitasking support, and runs on devices with 30KB RAM/ROM.

Supported hardware includes:

  • TI CC2538
  • nRF52832
  • TI MSP430x
  • Atmel AVR, Atmega128rfa1

RIOT

Developed by a German-French university consortium, RIOT is a real-time OS using a microkernel. It runs on 8-32bit microcontrollers.

Supported protocols:

  • 802.15.4 Zigbee
  • 6LoWPAN
  • ICMP6, IPv6, RPL, CoAP

Runs on devices with 1.5KB RAM and 5KB ROM; supports MSP430, ARM7, Cortex-M0/M3/M4, x86.

Huawei LiteOS

Developed by Huawei, this 10KB-size real-time OS supports auto discovery, zero configuration, and networking.

Supported protocols:

  • LTE, NB-IoT
  • Wi-Fi, 6LoWPAN

Supports multi-CPU architectures: ARM, DSP, MIPS, x86. Can integrate with Android devices and third-party systems.

Apache Mynewt

Apache-licensed real-time OS that runs on devices with 8KB RAM and 64KB ROM. The 6KB kernel supports:

  • Preemptive multithreading
  • Priority scheduling
  • Memory management
  • Watchdogs

Currently supports only Bluetooth Low Energy but future support includes Wi-Fi, Thread, Bluetooth 5.

Supported boards:

  • Arduino Zero, Zero Pro, M0 Pro
  • Arduino 101, Primo

Zephyr

A Linux Foundation project under Apache 2.0 license, launched in 2016. It uses static compilation for increased security.

Minimum memory: 8KB. Supported protocols:

  • Bluetooth, Bluetooth LE
  • Wi-Fi, 6LoWPAN
  • CoAP, NFC

Supported architectures: ARM, x86, ARC, RISC-V, NIOS-II.

Ubuntu Core

Canonical’s Snappy Ubuntu Core 16 is based on snap packages. The base image is 350MB, with all components stored as isolated images.

Supported boards:

  • Qualcomm Dragonboard
  • Samsung Artik
  • Intel Joule
  • Raspberry Pi 2 and 3

Read more: Ubuntu Core 16: Building secure and interoperable IoT ecosystems.

Open source is not just a development model, but a powerful opportunity for coders worldwide to make an impact and touch lives through collaboration and contribution.

In the Spotlight

Technical Screening Guide: All You Need To Know

Read this guide and learn how you can establish a less frustrating developer hiring workflow for both hiring teams and candidates.
Read More
Top Products

Explore HackerEarth’s top products for Hiring & Innovation

Discover powerful tools designed to streamline hiring, assess talent efficiently, and run seamless hackathons. Explore HackerEarth’s top products that help businesses innovate and grow.
Frame
Hackathons
Engage global developers through innovation
Arrow
Frame 2
Assessments
AI-driven advanced coding assessments
Arrow
Frame 3
FaceCode
Real-time code editor for effective coding interviews
Arrow
Frame 4
L & D
Tailored learning paths for continuous assessments
Arrow
Authors

Meet our Authors

Get to know the experts behind our content. From industry leaders to tech enthusiasts, our authors share valuable insights, trends, and expertise to keep you informed and inspired.
Ruehie Jaiya Karri
Kumari Trishya

Forecasting Tech Hiring Trends For 2023 With 6 Experts

2023 is here, and it is time to look ahead. Start planning your tech hiring needs as per your business requirements, revamp your recruiting processes, and come up with creative ways to land that perfect “unicorn candidate”!

Right? Well, jumping in blindly without heeding what this year holds for you can be a mistake. So before you put together your plans, ask yourselves this—What are the most important 2023 recruiting trends in tech hiring that you should be prepared for? What are the predictions that will shape this year?

We went around and posed three important questions to industry experts that were on our minds. And what they had to say certainly gave us some food for thought!

Before we dive in, allow me to introduce you to our expert panel of six, who had so much to say from personal experience!

Meet the Expert Panel

Radoslav Stankov

Radoslav Stankov has more than 20 years of experience working in tech. He is currently Head of Engineering at Product Hunt. Enjoys blogging, conference speaking, and solving problems.

Mike Cohen

Mike “Batman” Cohen is the Founder of Wayne Technologies, a Sourcing-as-a-Service company providing recruitment data and candidate outreach services to enhance the talent acquisition journey.

Pamela Ilieva

Pamela Ilieva is the Director of International Recruitment at Shortlister, a platform that connects employers to wellness, benefits, and HR tech vendors.

Brian H. Hough

Brian H. Hough is a Web2 and Web3 software engineer, AWS Community Builder, host of the Tech Stack Playbook™ YouTube channel/podcast, 5-time global hackathon winner, and tech content creator with 10k+ followers.

Steve O'Brien

Steve O'Brien is Senior Vice President, Talent Acquisition at Syneos Health, leading a global team of top recruiters across 30+ countries in 24+ languages, with nearly 20 years of diverse recruitment experience.

Patricia (Sonja Sky) Gatlin

Patricia (Sonja Sky) Gatlin is a New York Times featured activist, DEI Specialist, EdTechie, and Founder of Newbies in Tech. With 10+ years in Higher Education and 3+ in Tech, she now works part-time as a Diversity Lead recruiting STEM professionals to teach gifted students.

Overview of the upcoming tech industry landscape in 2024

Continued emphasis on remote work and flexibility: As we move into 2024, the tech industry is expected to continue embracing remote work and flexible schedules. This trend, accelerated by the COVID-19 pandemic, has proven to be more than a temporary shift. Companies are finding that remote work can lead to increased productivity, a broader talent pool, and better work-life balance for employees. As a result, recruiting strategies will likely focus on leveraging remote work capabilities to attract top talent globally.

Rising demand for AI and Machine Learning Skills: Artificial Intelligence (AI) and Machine Learning (ML) continue to be at the forefront of technological advancement. In 2024, these technologies are expected to become even more integrated into various business processes, driving demand for professionals skilled in AI and ML. Companies will likely prioritize candidates with expertise in these areas, and there may be an increased emphasis on upskilling existing employees to meet this demand.

Increased focus on cybersecurity: With the digital transformation of businesses, cybersecurity remains a critical concern. The tech industry in 2024 is anticipated to see a surge in the need for cybersecurity professionals. Companies will be on the lookout for talent capable of protecting against evolving cyber threats and ensuring data privacy.

Growth in cloud computing and edge computing: Cloud computing continues to grow, but there is also an increasing shift towards edge computing – processing data closer to where it is generated. This shift will likely create new job opportunities and skill requirements, influencing recruiting trends in the tech industry.

Sustainable technology and green computing: The global emphasis on sustainability is pushing the tech industry towards green computing and environmentally friendly technologies. In 2024, companies may seek professionals who can contribute to sustainable technology initiatives, adding a new dimension to tech recruiting.

Emphasis on soft skills: While technical skills remain paramount, soft skills like adaptability, communication, and problem-solving are becoming increasingly important. Companies are recognizing the value of these skills in fostering innovation and teamwork, especially in a remote or hybrid work environment.

Diversity, Equity, and Inclusion (DEI): There is an ongoing push towards more diverse and inclusive workplaces. In 2024, tech companies will likely continue to strengthen their DEI initiatives, affecting how they recruit and retain talent.

6 industry experts predict the 2023 recruiting trends

#1 We've seen many important moments in the tech industry this year...

Rado: In my opinion, a lot of those will carry over. I felt this was a preparation year for what was to come...

Mike: I wish I had the crystal ball for this, but I hope that when the market starts picking up again...

Pamela: Quiet quitting has been here way before 2022, and it is here to stay if organizations and companies...

Pamela Ilieva, Director of International Recruitment, Shortlister

Also, read: What Tech Companies Need To Know About Quiet Quitting


Brian: Yes, absolutely. In the 2022 Edelman Trust Barometer report...

Steve: Quiet quitting in the tech space will naturally face pressure as there is a redistribution of tech talent...

Patricia: Quiet quitting has been around for generations—people doing the bare minimum because they are no longer incentivized...

Patricia Gatlin, DEI Specialist and Curator, #blacklinkedin

#2 What is your pro tip for HR professionals/engineering managers...

Rado: Engineering managers should be able to do "more-with-less" in the coming year.

Radoslav Stankov, Head of Engineering, Product Hunt

Mike: Well first, (shameless plug), be in touch with me/Wayne Technologies as a stop-gap for when the time comes.

Mike “Batman” Cohen, Founder of Wayne Technologies

It's in the decrease and increase where companies find the hardest challenges...

Pamela: Remain calm – no need to “add fuel to the fire”!...

Brian: We have to build during the bear markets to thrive in the bull markets.

Companies can create internal hackathons to exercise creativity...


Also, read: Internal Hackathons - Drive Innovation And Increase Engagement In Tech Teams


Steve: HR professionals facing a hiring freeze will do well to “upgrade” processes, talent, and technology aggressively during downtime...

Steve O'Brien, Senior Vice President, Talent Acquisition at Syneos Health

Patricia: Talk to hiring managers in all your departments. Ask, what are the top 3-5 roles they are hiring for in the new year?...


Also, watch: 5 Recruiting Tips To Navigate The Hiring Freeze With Shalini Chandra, Senior TA, HackerEarth


#3 What top 3 skills would you like HR professionals/engineering managers to add to their repertoire in 2023 to deal with upcoming challenges?

6 industry experts predict the 2023 recruiting trends

Rado: Prioritization, team time, and environment management.

I think "prioritization" and "team time" management are obvious. But what do I mean by "environment management"?

A productive environment is one of the key ingredients for a productive team. Look at where your team wastes most time, which can be automated. For example, end-to-end writing tests take time because our tools are cumbersome and undocumented. So let's improve this.

Mike: Setting better metrics/KPIs, moving away from LinkedIn, and sharing more knowledge.

  1. Metrics/KPIs: Become better at setting measurable KPIs and accountable metrics. They are not the same thing—it's like the Square and Rectangle. One fits into the other but they're not the same. Hold people accountable to metrics, not KPIs. Make sure your metrics are aligned with company goals and values, and that they push employees toward excellence, not mediocrity.
  2. Freedom from LinkedIn: This is every year, and will probably continue to be. LinkedIn is a great database, but it is NOT the only way to find candidates, and oftentimes, not even the most effective/efficient. Explore other tools and methodologies!
  3. Join the conversation: I'd love to see new names of people presenting at conferences and webinars. And also, see new authors on the popular TA content websites. Everyone has things they can share—be a part of the community, not just a user of. Join FB groups, write and post articles, and comment on other people's posts with more than 'Great article'. It's a great community, but it's only great because of the people who contribute to it—be one of those people.

Pamela: Resilience, leveraging data, and self-awareness.

  1. Resilience: A “must-have” skill for the 21st century due to constant changes in the tech industry. Face and adapt to challenges. Overcome them and handle disappointments. Never give up. This will keep HR people alive in 2023.
  2. Data skills: Get some data analyst skills. The capacity to transfer numbers into data can help you be a better HR professional, prepared to improve the employee experience and show your leadership team how HR is leveraging data to drive business results.
  3. Self-awareness: Allows you to react better to upsetting situations and workplace challenges. It is a healthy skill to cultivate – especially as an HR professional.

Also, read: Diving Deep Into The World Of Data Science With Ashutosh Kumar


Brian: Agility, resourcefulness, and empathy.

  1. Agility: Allows professionals to move with market conditions. Always be as prepared as possible for any situation to come. Be flexible based on what does or does not happen.
  2. Resourcefulness: Allows professionals to do more with less. It also helps them focus on how to amplify, lift, and empower the current teams to be the best they can be.
  3. Empathy: Allows professionals to take a more proactive approach to listening and understanding where all workers are coming from. Amid stressful situations, companies need empathetic team members and leaders alike who can meet each other wherever they are and be a support.

Steve: Negotiation, data management, and talent development.

  1. Negotiation: Wage transparency laws will fundamentally change the compensation conversation. We must ensure we are still discussing compensation early in the process. And not just “assume” everyone’s on the same page because “the range is published”.
  2. Data management and predictive analytics: Looking at your organization's talent needs as a casserole of indistinguishable components and demands will not be good enough. We must upgrade the accuracy and consistency of our data and the predictions we can make from it.

Also, read: The Role of Talent Intelligence in Optimizing Recruitment


  1. Talent development: We’ve been exploring the interplay between TA and TM for years. Now is the time to integrate your internal and external talent marketplaces. To provide career experiences to people within your organization and not just those joining your organization.

Patricia: Technology, research, and relationship building.

  1. Technology: Get better at understanding the technology that’s out there. To help you speed up the process, track candidate experience, but also eliminate bias. Metrics are becoming big in HR.
  2. Research: Honestly, read more books. Many great thought leaders put out content about the “future of work”, understanding “Gen Z”, or “quiet quitting.” Dedicate work hours to understanding your ever-changing field.
  3. Relationship Building: Especially in your immediate communities. Most people don’t know who you are or what exactly it is that you do. Build your personal brand and what you are doing at your company to impact those closest to you. Create a referral funnel to get a pipeline going. When people want a job you and your company ought to be top of mind. Also, tell the stories of the people that work there.

7 Tech Recruiting Trends To Watch Out For In 2024

The last couple of years transformed how the world works and the tech industry is no exception. Remote work, a candidate-driven market, and automation are some of the tech recruiting trends born out of the pandemic.

While accepting the new reality and adapting to it is the first step, keeping up with continuously changing hiring trends in technology is the bigger challenge right now.

What does 2024 hold for recruiters across the globe? What hiring practices would work best in this post-pandemic world? How do you stay on top of the changes in this industry?

The answers to these questions will paint a clearer picture of how to set up for success while recruiting tech talent this year.

7 tech recruiting trends for 2024

6 Tech Recruiting Trends To Watch Out For In 2022

Recruiters, we’ve got you covered. Here are the tech recruiting trends that will change the way you build tech teams in 2024.

Trend #1—Leverage data-driven recruiting

Data-driven recruiting strategies are the answer to effective talent sourcing and a streamlined hiring process.

Talent acquisition leaders need to use real-time analytics like pipeline growth metrics, offer acceptance rates, quality and cost of new hires, and candidate feedback scores to reduce manual work, improve processes, and hire the best talent.

The key to capitalizing on talent market trends in 2024 is data. It enables you to analyze what’s working and what needs refinement, leaving room for experimentation.

Trend #2—Have impactful employer branding

98% of recruiters believe promoting company culture helps sourcing efforts as seen in our 2021 State Of Developer Recruitment report.

Having a strong employer brand that supports a clear Employer Value Proposition (EVP) is crucial to influencing a candidate’s decision to work with your company. Perks like upskilling opportunities, remote work, and flexible hours are top EVPs that attract qualified candidates.

A clear EVP builds a culture of balance, mental health awareness, and flexibility—strengthening your employer brand with candidate-first policies.

Trend #3—Focus on candidate-driven market

The pandemic drastically increased the skills gap, making tech recruitment more challenging. With the severe shortage of tech talent, candidates now hold more power and can afford to be selective.

Competitive pay is no longer enough. Use data to understand what candidates want—work-life balance, remote options, learning opportunities—and adapt accordingly.

Recruiters need to think creatively to attract and retain top talent.


Recommended read: What NOT To Do When Recruiting Fresh Talent


Trend #4—Have a diversity and inclusion oriented company culture

Diversity and inclusion have become central to modern recruitment. While urgent hiring can delay D&I efforts, long-term success depends on inclusive teams. Our survey shows that 25.6% of HR professionals believe a diverse leadership team helps build stronger pipelines and reduces bias.

McKinsey’s Diversity Wins report confirms this: top-quartile gender-diverse companies see 25% higher profitability, and ethnically diverse teams show 36% higher returns.

It's refreshing to see the importance of an inclusive culture increasing across all job-seeking communities, especially in tech. This reiterates that D&I is a must-have, not just a good-to-have.

—Swetha Harikrishnan, Sr. HR Director, HackerEarth

Recommended read: Diversity And Inclusion in 2022 - 5 Essential Rules To Follow


Trend #5—Embed automation and AI into your recruitment systems

With the rise of AI tools like ChatGPT, automation is being adopted across every business function—including recruiting.

Manual communication with large candidate pools is inefficient. In 2024, recruitment automation and AI-powered platforms will automate candidate nurturing and communication, providing a more personalized experience while saving time.

Trend #6—Conduct remote interviews

With 32.5% of companies planning to stay remote, remote interviewing is here to stay.

Remote interviews expand access to global talent, reduce overhead costs, and increase flexibility—making the hiring process more efficient for both recruiters and candidates.

Trend #7—Be proactive in candidate engagement

Delayed responses or lack of updates can frustrate candidates and impact your brand. Proactive communication and engagement with both active and passive candidates are key to successful recruiting.

As recruitment evolves, proactive candidate engagement will become central to attracting and retaining talent. In 2023 and beyond, companies must engage both active and passive candidates through innovative strategies and technologies like chatbots and AI-powered systems. Building pipelines and nurturing relationships will enhance employer branding and ensure long-term hiring success.

—Narayani Gurunathan, CEO, PlaceNet Consultants

Recruiting Tech Talent Just Got Easier With HackerEarth

Recruiting qualified tech talent is tough—but we’re here to help. HackerEarth for Enterprises offers an all-in-one suite that simplifies sourcing, assessing, and interviewing developers.

Our tech recruiting platform enables you to:

  • Tap into a 6 million-strong developer community
  • Host custom hackathons to engage talent and boost your employer brand
  • Create online assessments to evaluate 80+ tech skills
  • Use dev-friendly IDEs and proctoring for reliable evaluations
  • Benchmark candidates against a global community
  • Conduct live coding interviews with FaceCode, our collaborative coding interview tool
  • Guide upskilling journeys via our Learning and Development platform
  • Integrate seamlessly with all leading ATS systems
  • Access 24/7 support with a 95% satisfaction score

Recommended read: The A-Zs Of Tech Recruiting - A Guide


Staying ahead of tech recruiting trends, improving hiring processes, and adapting to change is the way forward in 2024. Take note of the tips in this article and use them to build a future-ready hiring strategy.

Ready to streamline your tech recruiting? Try HackerEarth for Enterprises today.

Code In Progress - The Life And Times Of Developers In 2021

Developers. Are they as mysterious as everyone makes them out to be? Is coding the only thing they do all day? Good coders work around the clock, right?

While developers are some of the most coveted talent out there, they also have the most myths being circulated. Most of us forget that developers too are just like us. And no, they do not code all day long.

We wanted to bust a lot of these myths and shed light on how the programming world looks through a developer’s lens in 2021—especially in the wake of a global pandemic. This year’s edition of the annual HackerEarth Developer Survey is packed with developers’ wants and needs when choosing jobs, major gripes with the WFH scenario, and the latest market trends to watch out for, among others.

Our 2021 report is bigger and better, with responses from 25,431 developers across 171 countries. Let’s find out what makes a developer tick, shall we?

Developer Survey

“Good coders work around the clock.” No, they don’t.

Busting the myth that developers spend the better part of their day coding, 52% of student developers said that they prefer to code for a maximum of 3 hours per day.

When not coding, devs swear by their walks as a way to unwind. When we asked devs the same question last year, they said they liked to indulge in indoor games like foosball. In 2021, going for walks has become the most popular method of de-stressing. We’re chalking it up to working from home and not having a chance to stretch their legs.

Staying ahead of the skills game

Following the same trend as last year, students (39%) and working professionals (44%) voted for Go as one of the most popular programming languages that they want to learn. The other programming languages that devs are interested in learning are Rust, Kotlin, and Erlang.

Programming languages that students are most skilled at are HTML/CSS, C++, and Python. Senior developers are more comfortable working with HTML/CSS, SQL, and Java.

How happy are developers

Employees from middle market organizations had the highest 'happiness index' of 7.2. Experienced developers who work at enterprises are marginally less happy in comparison to people who work at smaller companies.

However, happiness is not a binding factor for where developers work. Despite scoring the least on the happiness scale, working professionals would still like to work at enterprise companies and growth-stage startups.

What works when looking for work

Student devs (63%), who are just starting in the tech world, said a good career growth curve is a must-have. Working professionals can be wooed by offers of a good career path (69%) and compensation (68%).

One trend that has changed since last year is that at least 50% of students and working professionals alike care a lot more about ESOPs and positive Glassdoor reviews now than they did in 2020.


To know more about what developers want, download your copy of the report now!


We went a step further and organized an event with our CEO, Sachin Gupta, Radoslav Stankov, Head of Engineering at Product Hunt, and Steve O’Brien, President of Talent Solutions at Job.com to further dissect the findings of our survey.

Tips straight from the horse’s mouth

Steve highlighted how the information collated from the developer survey affects the recruiting community and how they can leverage this data to hire better and faster.

  • The insight where developer happiness is correlated to work hours didn’t find a significant difference between the cohorts. Devs working for less than 40 hours seemed marginally happier than those that clocked in more than 60 hours a week.
“This is an interesting data point, which shows that devs are passionate about what they do. You can increase their workload by 50% and still not affect their happiness. From a work perspective, as a recruiter, you have to get your hiring manager to understand that while devs never say no to more work, HMs shouldn’t overload the devs. Devs are difficult to source and burnout only leads to killing your talent pool, which is something that you do not want,” says Steve.
  • Roughly 45% of both student and professional developers learned how to code in college was another insight that was open to interpretation.
“Let’s look at it differently. Less than half of the surveyed developers learned how to code in college. There’s a major segment of the market today that is not necessarily following the ‘college degree to getting a job’ path. Developers are beginning to look at their skillsets differently and using various platforms to upskill themselves. Development is not about pedigree, it’s more about the potential to demonstrate skills. This is an interesting shift in the way we approach testing and evaluating devs in 2021.”

Rado contextualized the data from the survey to see what it means for the developer community and what trends to watch out for in 2021.

  • Node.js and AngularJS are the most popular frameworks among students and professionals.
“I was surprised by how many young students wanted to learn AngularJS, given that it’s more of an enterprise framework. Another thing that stood out to me was that the younger generation wants to learn technologies that are not necessarily cool like ExtJS (35%). This is good because people are picking technologies that they enjoy working with instead of just going along with what everyone else is doing. This also builds a more diverse technology pool.” — Rado
  • 22% of devs say ‘Zoom Fatigue’ is real and directly affects productivity.
“Especially for younger people who still haven’t figured out a routine to develop their skills, there is something I’d like you to try out. Start using noise-canceling headphones. They help keep distractions to a minimum. I find clutter-free working spaces to be an interesting concept as well.”

The last year and a half have been a doozy for developers everywhere, with a lot of things changing, and some things staying the same. With our developer survey, we wanted to shine the spotlight on skill-based hiring and market trends in 2021—plus highlight the fact that developers too have their gripes and happy hours.

Uncover many more developer trends for 2021 with Steve and Rado below:

View all

Best Pre-Employment Assessments: Optimizing Your Hiring Process for 2024

In today's competitive talent market, attracting and retaining top performers is crucial for any organization's success. However, traditional hiring methods like relying solely on resumes and interviews may not always provide a comprehensive picture of a candidate's skills and potential. This is where pre-employment assessments come into play.

What is Pre-Employement Assessment?

Pre-employment assessments are standardized tests and evaluations administered to candidates before they are hired. These assessments can help you objectively measure a candidate's knowledge, skills, abilities, and personality traits, allowing you to make data-driven hiring decisions.

By exploring and evaluating the best pre-employment assessment tools and tests available, you can:

  • Improve the accuracy and efficiency of your hiring process.
  • Identify top talent with the right skills and cultural fit.
  • Reduce the risk of bad hires.
  • Enhance the candidate experience by providing a clear and objective evaluation process.

This guide will provide you with valuable insights into the different types of pre-employment assessments available and highlight some of the best tools, to help you optimize your hiring process for 2024.

Why pre-employment assessments are key in hiring

While resumes and interviews offer valuable insights, they can be subjective and susceptible to bias. Pre-employment assessments provide a standardized and objective way to evaluate candidates, offering several key benefits:

  • Improved decision-making:

    By measuring specific skills and knowledge, assessments help you identify candidates who possess the qualifications necessary for the job.

  • Reduced bias:

    Standardized assessments mitigate the risks of unconscious bias that can creep into traditional interview processes.

  • Increased efficiency:

    Assessments can streamline the initial screening process, allowing you to focus on the most promising candidates.

  • Enhanced candidate experience:

    When used effectively, assessments can provide candidates with a clear understanding of the required skills and a fair chance to showcase their abilities.

Types of pre-employment assessments

There are various types of pre-employment assessments available, each catering to different needs and objectives. Here's an overview of some common types:

1. Skill Assessments:

  • Technical Skills: These assessments evaluate specific technical skills and knowledge relevant to the job role, such as programming languages, software proficiency, or industry-specific expertise. HackerEarth offers a wide range of validated technical skill assessments covering various programming languages, frameworks, and technologies.
  • Soft Skills: These employment assessments measure non-technical skills like communication, problem-solving, teamwork, and critical thinking, crucial for success in any role.

2. Personality Assessments:

These employment assessments can provide insights into a candidate's personality traits, work style, and cultural fit within your organization.

3. Cognitive Ability Tests:

These tests measure a candidate's general mental abilities, such as reasoning, problem-solving, and learning potential.

4. Integrity Assessments:

These employment assessments aim to identify potential risks associated with a candidate's honesty, work ethic, and compliance with company policies.

By understanding the different types of assessments and their applications, you can choose the ones that best align with your specific hiring needs and ensure you hire the most qualified and suitable candidates for your organization.

Leading employment assessment tools and tests in 2024

Choosing the right pre-employment assessment tool depends on your specific needs and budget. Here's a curated list of some of the top pre-employment assessment tools and tests available in 2024, with brief overviews:

  • HackerEarth:

    A comprehensive platform offering a wide range of validated skill assessments in various programming languages, frameworks, and technologies. It also allows for the creation of custom assessments and integrates seamlessly with various recruitment platforms.

  • SHL:

    Provides a broad selection of assessments, including skill tests, personality assessments, and cognitive ability tests. They offer customizable solutions and cater to various industries.

  • Pymetrics:

    Utilizes gamified assessments to evaluate cognitive skills, personality traits, and cultural fit. They offer a data-driven approach and emphasize candidate experience.

  • Wonderlic:

    Offers a variety of assessments, including the Wonderlic Personnel Test, which measures general cognitive ability. They also provide aptitude and personality assessments.

  • Harver:

    An assessment platform focusing on candidate experience with video interviews, gamified assessments, and skills tests. They offer pre-built assessments and customization options.

Remember: This list is not exhaustive, and further research is crucial to identify the tool that aligns best with your specific needs and budget. Consider factors like the types of assessments offered, pricing models, integrations with your existing HR systems, and user experience when making your decision.

Choosing the right pre-employment assessment tool

Instead of full individual tool reviews, consider focusing on 2–3 key platforms. For each platform, explore:

  • Target audience: Who are their assessments best suited for (e.g., technical roles, specific industries)?
  • Types of assessments offered: Briefly list the available assessment categories (e.g., technical skills, soft skills, personality).
  • Key features: Highlight unique functionalities like gamification, custom assessment creation, or seamless integrations.
  • Effectiveness: Briefly mention the platform's approach to assessment validation and reliability.
  • User experience: Consider including user reviews or ratings where available.

Comparative analysis of assessment options

Instead of a comprehensive comparison, consider focusing on specific use cases:

  • Technical skills assessment:

    Compare HackerEarth and Wonderlic based on their technical skill assessment options, focusing on the variety of languages/technologies covered and assessment formats.

  • Soft skills and personality assessment:

    Compare SHL and Pymetrics based on their approaches to evaluating soft skills and personality traits, highlighting any unique features like gamification or data-driven insights.

  • Candidate experience:

    Compare Harver and Wonderlic based on their focus on candidate experience, mentioning features like video interviews or gamified assessments.

Additional tips:

  • Encourage readers to visit the platforms' official websites for detailed features and pricing information.
  • Include links to reputable third-party review sites where users share their experiences with various tools.

Best practices for using pre-employment assessment tools

Integrating pre-employment assessments effectively requires careful planning and execution. Here are some best practices to follow:

  • Define your assessment goals:

    Clearly identify what you aim to achieve with assessments. Are you targeting specific skills, personality traits, or cultural fit?

  • Choose the right assessments:

    Select tools that align with your defined goals and the specific requirements of the open position.

  • Set clear expectations:

    Communicate the purpose and format of the assessments to candidates in advance, ensuring transparency and building trust.

  • Integrate seamlessly:

    Ensure your chosen assessment tool integrates smoothly with your existing HR systems and recruitment workflow.

  • Train your team:

    Equip your hiring managers and HR team with the knowledge and skills to interpret assessment results effectively.

Interpreting assessment results accurately

Assessment results offer valuable data points, but interpreting them accurately is crucial for making informed hiring decisions. Here are some key considerations:

  • Use results as one data point:

    Consider assessment results alongside other information, such as resumes, interviews, and references, for a holistic view of the candidate.

  • Understand score limitations:

    Don't solely rely on raw scores. Understand the assessment's validity and reliability and the potential for cultural bias or individual test anxiety.

  • Look for patterns and trends:

    Analyze results across different assessments and identify consistent patterns that align with your desired candidate profile.

  • Focus on potential, not guarantees:

    Assessments indicate potential, not guarantees of success. Use them alongside other evaluation methods to make well-rounded hiring decisions.

Choosing the right pre-employment assessment tools

Selecting the most suitable pre-employment assessment tool requires careful consideration of your organization's specific needs. Here are some key factors to guide your decision:

  • Industry and role requirements:

    Different industries and roles demand varying skill sets and qualities. Choose assessments that target the specific skills and knowledge relevant to your open positions.

  • Company culture and values:

    Align your assessments with your company culture and values. For example, if collaboration is crucial, look for assessments that evaluate teamwork and communication skills.

  • Candidate experience:

    Prioritize tools that provide a positive and smooth experience for candidates. This can enhance your employer brand and attract top talent.

Budget and accessibility considerations

Budget and accessibility are essential factors when choosing pre-employment assessments:

  • Budget:

    Assessment tools come with varying pricing models (subscriptions, pay-per-use, etc.). Choose a tool that aligns with your budget and offers the functionalities you need.

  • Accessibility:

    Ensure the chosen assessment is accessible to all candidates, considering factors like language options, disability accommodations, and internet access requirements.

Additional Tips:

  • Free trials and demos: Utilize free trials or demos offered by assessment platforms to experience their functionalities firsthand.
  • Consult with HR professionals: Seek guidance from HR professionals or recruitment specialists with expertise in pre-employment assessments.
  • Read user reviews and comparisons: Gain insights from other employers who use various assessment tools.

By carefully considering these factors, you can select the pre-employment assessment tool that best aligns with your organizational needs, budget, and commitment to an inclusive hiring process.

Remember, pre-employment assessments are valuable tools, but they should not be the sole factor in your hiring decisions. Use them alongside other evaluation methods and prioritize building a fair and inclusive hiring process that attracts and retains top talent.

Future trends in pre-employment assessments

The pre-employment assessment landscape is constantly evolving, with innovative technologies and practices emerging. Here are some potential future trends to watch:

  • Artificial intelligence (AI):

    AI-powered assessments can analyze candidate responses, written work, and even resumes, using natural language processing to extract relevant insights and identify potential candidates.

  • Adaptive testing:

    These assessments adjust the difficulty level of questions based on the candidate's performance, providing a more efficient and personalized evaluation.

  • Micro-assessments:

    Short, focused assessments delivered through mobile devices can assess specific skills or knowledge on-the-go, streamlining the screening process.

  • Gamification:

    Engaging and interactive game-based elements can make the assessment experience more engaging and assess skills in a realistic and dynamic way.

Conclusion

Pre-employment assessments, when used thoughtfully and ethically, can be a powerful tool to optimize your hiring process, identify top talent, and build a successful workforce for your organization. By understanding the different types of assessments available, exploring top-rated tools like HackerEarth, and staying informed about emerging trends, you can make informed decisions that enhance your ability to attract, evaluate, and hire the best candidates for the future.

Tech Layoffs: What To Expect In 2024

Layoffs in the IT industry are becoming more widespread as companies fight to remain competitive in a fast-changing market; many turn to layoffs as a cost-cutting measure. Last year, 1,000 companies including big tech giants and startups, laid off over two lakhs of employees. But first, what are layoffs in the tech business, and how do they impact the industry?

Tech layoffs are the termination of employment for some employees by a technology company. It might happen for various reasons, including financial challenges, market conditions, firm reorganization, or the after-effects of a pandemic. While layoffs are not unique to the IT industry, they are becoming more common as companies look for methods to cut costs while remaining competitive.

The consequences of layoffs in technology may be catastrophic for employees who lose their jobs and the firms forced to make these difficult decisions. Layoffs can result in the loss of skill and expertise and a drop in employee morale and productivity. However, they may be required for businesses to stay afloat in a fast-changing market.

This article will examine the reasons for layoffs in the technology industry, their influence on the industry, and what may be done to reduce their negative impacts. We will also look at the various methods for tracking tech layoffs.

What are tech layoffs?

The term "tech layoff" describes the termination of employees by an organization in the technology industry. A company might do this as part of a restructuring during hard economic times.

In recent times, the tech industry has witnessed a wave of significant layoffs, affecting some of the world’s leading technology companies, including Amazon, Microsoft, Meta (formerly Facebook), Apple, Cisco, SAP, and Sony. These layoffs are a reflection of the broader economic challenges and market adjustments facing the sector, including factors like slowing revenue growth, global economic uncertainties, and the need to streamline operations for efficiency.

Each of these tech giants has announced job cuts for various reasons, though common themes include restructuring efforts to stay competitive and agile, responding to over-hiring during the pandemic when demand for tech services surged, and preparing for a potentially tough economic climate ahead. Despite their dominant positions in the market, these companies are not immune to the economic cycles and technological shifts that influence operational and strategic decisions, including workforce adjustments.

This trend of layoffs in the tech industry underscores the volatile nature of the tech sector, which is often at the mercy of rapid changes in technology, consumer preferences, and the global economy. It also highlights the importance of adaptability and resilience for companies and employees alike in navigating the uncertainties of the tech landscape.

Causes for layoffs in the tech industry

Why are tech employees suffering so much?

Yes, the market is always uncertain, but why resort to tech layoffs?

Various factors cause tech layoffs, including company strategy changes, market shifts, or financial difficulties. Companies may lay off employees if they need help to generate revenue, shift their focus to new products or services, or automate certain jobs.

In addition, some common reasons could be:

Financial struggles

Currently, the state of the global market is uncertain due to economic recession, ongoing war, and other related phenomena. If a company is experiencing financial difficulties, only sticking to pay cuts may not be helpful—it may need to reduce its workforce to cut costs.


Also, read: 6 Steps To Create A Detailed Recruiting Budget (Template Included)


Changes in demand

The tech industry is constantly evolving, and companies would have to adjust their workforce to meet changing market conditions. For instance, companies are adopting remote work culture, which surely affects on-premises activity, and companies could do away with some number of tech employees at the backend.

Restructuring

Companies may also lay off employees as part of a greater restructuring effort, such as spinning off a division or consolidating operations.

Automation

With the advancement in technology and automation, some jobs previously done by human labor may be replaced by machines, resulting in layoffs.

Mergers and acquisitions

When two companies merge, there is often overlap in their operations, leading to layoffs as the new company looks to streamline its workforce.

But it's worth noting that layoffs are not exclusive to the tech industry and can happen in any industry due to uncertainty in the market.

Will layoffs increase in 2024?

It is challenging to estimate the rise or fall of layoffs. The overall state of the economy, the health of certain industries, and the performance of individual companies will play a role in deciding the degree of layoffs in any given year.

But it is also seen that, in the first 15 days of this year, 91 organizations laid off over 24,000 tech workers, and over 1,000 corporations cut down more than 150,000 workers in 2022, according to an Economic Times article.

The COVID-19 pandemic caused a huge economic slowdown and forced several businesses to downsize their employees. However, some businesses rehired or expanded their personnel when the world began to recover.

So, given the current level of economic uncertainty, predicting how the situation will unfold is difficult.


Also, read: 4 Images That Show What Developers Think Of Layoffs In Tech


What types of companies are prone to tech layoffs?

2023 Round Up Of Layoffs In Big Tech

Tech layoffs can occur in organizations of all sizes and various areas.

Following are some examples of companies that have experienced tech layoffs in the past:

Large tech firms

Companies such as IBM, Microsoft, Twitter, Better.com, Alibaba, and HP have all experienced layoffs in recent years as part of restructuring initiatives or cost-cutting measures.

Market scenarios are still being determined after Elon Musk's decision to lay off employees. Along with tech giants, some smaller companies and startups have also been affected by layoffs.

Startups

Because they frequently work with limited resources, startups may be forced to lay off staff if they cannot get further funding or need to pivot due to market downfall.

Small and medium-sized businesses

Small and medium-sized businesses face layoffs due to high competition or if the products/services they offer are no longer in demand.

Companies in certain industries

Some sectors of the technological industry, such as the semiconductor industry or automotive industry, may be more prone to layoffs than others.

Companies that lean on government funding

Companies that rely significantly on government contracts may face layoffs if the government cuts technology spending or contracts are not renewed.

How to track tech layoffs?

You can’t stop tech company layoffs, but you should be keeping track of them. We, HR professionals and recruiters, can also lend a helping hand in these tough times by circulating “layoff lists” across social media sites like LinkedIn and Twitter to help people land jobs quicker. Firefish Software put together a master list of sources to find fresh talent during the layoff period.

Because not all layoffs are publicly disclosed, tracking tech industry layoffs can be challenging, and some may go undetected. There are several ways to keep track of tech industry layoffs:

Use tech layoffs tracker

Layoff trackers like thelayoff.com and layoffs.fyi provide up-to-date information on layoffs.

In addition, they aid in identifying trends in layoffs within the tech industry. It can reveal which industries are seeing the most layoffs and which companies are the most affected.

Companies can use layoff trackers as an early warning system and compare their performance to that of other companies in their field.

News articles

Because many news sites cover tech layoffs as they happen, keeping a watch on technology sector stories can provide insight into which organizations are laying off employees and how many individuals have been affected.

Social media

Organizations and employees frequently publish information about layoffs in tech on social media platforms; thus, monitoring companies' social media accounts or following key hashtags can provide real-time updates regarding layoffs.

Online forums and communities

There are online forums and communities dedicated to discussing tech industry news, and they can be an excellent source of layoff information.

Government reports

Government agencies such as the Bureau of Labor Statistics (BLS) publish data on layoffs and unemployment, which can provide a more comprehensive picture of the technology industry's status.

How do companies reduce tech layoffs?

Layoffs in tech are hard – for the employee who is losing their job, the recruiter or HR professional who is tasked with informing them, and the company itself. So, how can we aim to avoid layoffs? Here are some ways to minimize resorting to letting people go:

Salary reductions

Instead of laying off employees, businesses can lower the salaries or wages of all employees. It can be accomplished by instituting compensation cuts or salary freezes.

Implementing a hiring freeze

Businesses can halt employing new personnel to cut costs. It can be a short-term solution until the company's financial situation improves.


Also, read: What Recruiters Can Focus On During A Tech Hiring Freeze


Non-essential expense reduction

Businesses might search for ways to cut or remove non-essential expenses such as travel, training, and office expenses.

Reducing working hours

Companies can reduce employee working hours to save money, such as implementing a four-day workweek or a shorter workday.

These options may not always be viable and may have their problems, but before laying off, a company owes it to its people to consider every other alternative, and formulate the best solution.

Tech layoffs to bleed into this year

While we do not know whether this trend will continue or subside during 2023, we do know one thing. We have to be prepared for a wave of layoffs that is still yet to hit. As of last month, Layoffs.fyi had already tracked 170+ companies conducting 55,970 layoffs in 2023.

So recruiters, let’s join arms, distribute those layoff lists like there’s no tomorrow, and help all those in need of a job! :)

What is Headhunting In Recruitment?: Types &amp; How Does It Work?

In today’s fast-paced world, recruiting talent has become increasingly complicated. Technological advancements, high workforce expectations and a highly competitive market have pushed recruitment agencies to adopt innovative strategies for recruiting various types of talent. This article aims to explore one such recruitment strategy – headhunting.

What is Headhunting in recruitment?

In headhunting, companies or recruitment agencies identify, engage and hire highly skilled professionals to fill top positions in the respective companies. It is different from the traditional process in which candidates looking for job opportunities approach companies or recruitment agencies. In headhunting, executive headhunters, as recruiters are referred to, approach prospective candidates with the hiring company’s requirements and wait for them to respond. Executive headhunters generally look for passive candidates, those who work at crucial positions and are not on the lookout for new work opportunities. Besides, executive headhunters focus on filling critical, senior-level positions indispensable to companies. Depending on the nature of the operation, headhunting has three types. They are described later in this article. Before we move on to understand the types of headhunting, here is how the traditional recruitment process and headhunting are different.

How do headhunting and traditional recruitment differ from each other?

Headhunting is a type of recruitment process in which top-level managers and executives in similar positions are hired. Since these professionals are not on the lookout for jobs, headhunters have to thoroughly understand the hiring companies’ requirements and study the work profiles of potential candidates before creating a list.

In the traditional approach, there is a long list of candidates applying for jobs online and offline. Candidates approach recruiters for jobs. Apart from this primary difference, there are other factors that define the difference between these two schools of recruitment.

AspectHeadhuntingTraditional RecruitmentCandidate TypePrimarily passive candidateActive job seekersApproachFocused on specific high-level rolesBroader; includes various levelsScopeproactive outreachReactive: candidates applyCostGenerally more expensive due to expertise requiredTypically lower costsControlManaged by headhuntersManaged internally by HR teams

All the above parameters will help you to understand how headhunting differs from traditional recruitment methods, better.

Types of headhunting in recruitment

Direct headhunting: In direct recruitment, hiring teams reach out to potential candidates through personal communication. Companies conduct direct headhunting in-house, without outsourcing the process to hiring recruitment agencies. Very few businesses conduct this type of recruitment for top jobs as it involves extensive screening across networks outside the company’s expanse.

Indirect headhunting: This method involves recruiters getting in touch with their prospective candidates through indirect modes of communication such as email and phone calls. Indirect headhunting is less intrusive and allows candidates to respond at their convenience.Third-party recruitment: Companies approach external recruitment agencies or executive headhunters to recruit highly skilled professionals for top positions. This method often leverages the company’s extensive contact network and expertise in niche industries.

How does headhunting work?

Finding highly skilled professionals to fill critical positions can be tricky if there is no system for it. Expert executive headhunters employ recruitment software to conduct headhunting efficiently as it facilitates a seamless recruitment process for executive headhunters. Most software is AI-powered and expedites processes like candidate sourcing, interactions with prospective professionals and upkeep of communication history. This makes the process of executive search in recruitment a little bit easier. Apart from using software to recruit executives, here are the various stages of finding high-calibre executives through headhunting.

Identifying the role

Once there is a vacancy for a top job, one of the top executives like a CEO, director or the head of the company, reach out to the concerned personnel with their requirements. Depending on how large a company is, they may choose to headhunt with the help of an external recruiting agency or conduct it in-house. Generally, the task is assigned to external recruitment agencies specializing in headhunting. Executive headhunters possess a database of highly qualified professionals who work in crucial positions in some of the best companies. This makes them the top choice of conglomerates looking to hire some of the best talents in the industry.

Defining the job

Once an executive headhunter or a recruiting agency is finalized, companies conduct meetings to discuss the nature of the role, how the company works, the management hierarchy among other important aspects of the job. Headhunters are expected to understand these points thoroughly and establish a clear understanding of their expectations and goals.

Candidate identification and sourcing

Headhunters analyse and understand the requirements of their clients and begin creating a pool of suitable candidates from their database. The professionals are shortlisted after conducting extensive research of job profiles, number of years of industry experience, professional networks and online platforms.

Approaching candidates

Once the potential candidates have been identified and shortlisted, headhunters move on to get in touch with them discreetly through various communication channels. As such candidates are already working at top level positions at other companies, executive headhunters have to be low-key while doing so.

Assessment and Evaluation

In this next step, extensive screening and evaluation of candidates is conducted to determine their suitability for the advertised position.

Interviews and negotiations

Compensation is a major topic of discussion among recruiters and prospective candidates. A lot of deliberation and negotiation goes on between the hiring organization and the selected executives which is facilitated by the headhunters.

Finalizing the hire

Things come to a close once the suitable candidates accept the job offer. On accepting the offer letter, headhunters help finalize the hiring process to ensure a smooth transition.

The steps listed above form the blueprint for a typical headhunting process. Headhunting has been crucial in helping companies hire the right people for crucial positions that come with great responsibility. However, all systems have a set of challenges no matter how perfect their working algorithm is. Here are a few challenges that talent acquisition agencies face while headhunting.

Common challenges in headhunting

Despite its advantages, headhunting also presents certain challenges:

Cost Implications: Engaging headhunters can be more expensive than traditional recruitment methods due to their specialized skills and services.

Time-Consuming Process: While headhunting can be efficient, finding the right candidate for senior positions may still take time due to thorough evaluation processes.

Market Competition: The competition for top talent is fierce; organizations must present compelling offers to attract passive candidates away from their current roles.

Although the above mentioned factors can pose challenges in the headhunting process, there are more upsides than there are downsides to it. Here is how headhunting has helped revolutionize the recruitment of high-profile candidates.

Advantages of Headhunting

Headhunting offers several advantages over traditional recruitment methods:

Access to Passive Candidates: By targeting individuals who are not actively seeking new employment, organisations can access a broader pool of highly skilled professionals.

Confidentiality: The discreet nature of headhunting protects both candidates’ current employment situations and the hiring organisation’s strategic interests.

Customized Search: Headhunters tailor their search based on the specific needs of the organization, ensuring a better fit between candidates and company culture.

Industry Expertise: Many headhunters specialise in particular sectors, providing valuable insights into market dynamics and candidate qualifications.

Conclusion

Although headhunting can be costly and time-consuming, it is one of the most effective ways of finding good candidates for top jobs. Executive headhunters face several challenges maintaining the g discreetness while getting in touch with prospective clients. As organizations navigate increasingly competitive markets, understanding the nuances of headhunting becomes vital for effective recruitment strategies. To keep up with the technological advancements, it is better to optimise your hiring process by employing online recruitment software like HackerEarth, which enables companies to conduct multiple interviews and evaluation tests online, thus improving candidate experience. By collaborating with skilled headhunters who possess industry expertise and insights into market trends, companies can enhance their chances of securing high-caliber professionals who drive success in their respective fields.

View all