Skip to main content

LeetCode: Kth Largest Element in an Array using QuickSelect with Hoare Partition

· 3 min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

Finding the Kth Largest Element in an Array is a problem that can be solved using approaches like QuickSort in (O(n \log n)) time and then picking the (k)-th element in the sorted array. A faster way is to use QuickSelect, which solves it in (O(n)) time. The solution below outperforms 100% of Java submissions on LeetCode for runtime.

QuickSelect

QuickSelect is a sorting algorithm that uses partitioning schemes like Hoare’s or Lomuto’s. It finds the (k)-th smallest value in (O(n)) time by employing a divide-and-conquer strategy. It recursively partitions and processes only the relevant section of the array.

private void quickSelect(int left, int right){
if (left < right) {
int p = partition(left, right);
if (kSmallest <= p) quickSelect(left, p);
else {
quickSelect(p + 1, right);
}
}
}

Notice if $kSmallest &lt= p$, then we know the nums array is partitioned from left to p which has values smaller than the values on the right portion from indices p+1 to right. So we only recursively sort the left.

If that is not the case, then we can sort on the right. But we cannot stop if p==kSmallest like in Lomuto’s partition because Hoare’s does not ensure that the value at the partition is the final position of the number.

Hoare’s Partitioning Scheme

Hoare’s partitioning is shown below. This partitioning scheme is much faster than Lomuto’s. There are plenty of online resources on this most of them are confusing in the one on Wikipedia. I wrote piece here on QuickSort that could help.

private int partition(int left, int right){

int p = (left+right)/2;
int pVal = nums[p];
int i = left - 1;
int j = right + 1;
while(true){
do {
i++;
} while( nums[i] < pVal);
do {
j--;
} while (nums[j] > pVal);
if(i<j){
swap(i,j);
} else{
return j;
}
}
}

Solution: Kth Largest Element in an Array

The solution below beats 100% Java submissions on LeetCode for runtime.

public class KthLargestElement {
int kSmallest;
int[] nums;

public int findKthLargest(int[] nums, int k) {
this.nums = nums;
this.kSmallest = nums.length-k;
quickSelect(0, nums.length-1);
return nums[kSmallest];
}

private void quickSelect(int left, int right){
if (left < right) {
int p = partition(left, right);

if (kSmallest <= p) quickSelect(left, p);
else {
quickSelect(p+1, right);
}
}
}
private int partition(int left, int right){

int p = (left+right)/2;
int pVal = nums[p];
int i = left - 1;
int j = right + 1;
while(true){
do {
i++;
} while( nums[i] < pVal);
do {
j--;
} while (nums[j] > pVal);
if(i<j){
swap(i,j);
} else{
return j;
}
}
}

private void swap(int i, int j){
int temp = nums[i];
nums[i] = nums[j];
nums[j] = temp;
}
}

The Most Important Skill in the Future

· 3 min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

Asking questions is the most important skill in the future. Finding possible answers to those questions is the next most important skill.

The Power of Questions

A question helps you land on a set of possible answers or, in the case of open-ended questions, an infinite set of possibilities. These possible answers lead to hypotheses, which help validate or reinforce our knowledge. For the modern entrepreneur, this type of hypothesis testing is crucial.

Asking questions is also an optimal approach to life. The Bellman Equation in Reinforcement Learning models this approach: testing out possible answers and choosing the one that seems best. While you may face failures, statistically, this method gives you the highest chance of success.

Problem Solving Through Questions

Asking questions is a powerful technique for solving problems, especially personal ones. In a world where the internet provides access to vast amounts of information, the ability to ask the right questions becomes essential for finding meaningful answers. This is the essence of the scientific method.

With more information available to everyone, success will depend on the ability to form logical sequences of questions. Those who can identify problems hidden within questions and pursue solutions will drive positive change. Asking the right questions will be a survival skill in the age of the 4th Industrial Revolution.

Challenging the Status Quo

Question everything, especially the status quo. Ask yourself: Could this be better? Even small, overlooked problems can lead to impactful solutions. Improving something by just 1% and helping a million people benefit from it creates a significant Total Addressable Market (TAM).

Paul Graham encourages entrepreneurs to seize these opportunities. These micro-problems, when addressed, can lead to solutions that evolve into viable businesses. Asking questions such as "Could this be better?" can uncover these opportunities.

From Incremental Improvements to Big Innovations

Even tiny improvements can lead to major innovations. Thomas Edison, who held an astounding number of patents, famously said that he rarely had "genuine" ideas but instead improved existing solutions to existing problems. Often, problems disguise themselves as questions.

Frameworks for Asking Better Questions

Here are a few simple frameworks to help you ask better questions:

  1. Use Basic Math Concepts

    • Can you divide the problem into smaller parts?
    • Could multiple small solutions add up to a larger one?
    • Think critically by asking how you can add, divide, subtract, or multiply concepts.
  2. Find Patterns

    • Look for similarities or dissimilarities.
    • What patterns can you recognize that could help solve the problem?
  3. The "WH" Questions

    • Who, What, Where, When, Why, and How: These questions can guide critical thinking and decision-making.
    • Why are you doing this? What are you going to do? How will you do it?
    • Simon Sinek’s famous TED talk, Start with Why, emphasizes this approach.
  4. Set Timelines

    • Asking "When?" turns a dream into a plan of action.
    • Realizing that life is short and setting deadlines is critical for transforming dreams into reality.

By mastering the skill of asking the right questions, you equip yourself with the most valuable tool for success in the future. Whether as an entrepreneur, leader, or problem-solver, your ability to question effectively will set you apart.

Connected Components in Graphs

· One min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

There can be three types of graphs here.

1. Undirected Graph

For undirected graphs, we can use Depth First Search (DFS) to find the connected component number for each vertex. The runtime is O(|V|+|E|).

2. Directed Graph

Directed graphs can be of two types. Directed Acyclic Graphs (DAG) and General Directed Graphs. DAGs's do not have cycles. In general directed graphs, cycles may exist.

So, for directed graph what we need to find is called Strongly Connected Component. In order to find it, first create a new graph where original edges are in reverse direction.

Next, run DFS on this reverse graph to find postorder numbers and order vertices in descending order. This first node is guranteed to be in a source SCC in the original graph.

Next, starting from the first node above, run DFS and assign component number to each nodes. When all the connected nodes are visited from the first node, move to the second node.

Thus after two rounds of DFS, we can find SCC for a general directed graph.

Based on the DFS, the runtime is O(|V|+|E|).

My Application to OMSCS

· 7 min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

I applied for Georgia Tech's OMSCS program in 2018 and started my first semester in January 2019. Since I have been into the program for over a year, I felt sometimes that I should share my admission essays. After this Spring 2020 semester, I will be halfway there to graduate in terms on credits. So I want to share the essays at this point of my new academic journey for my own reflection, and hopefully secure some accountability to keep pushing towards the goals I have set for myself. It has been taking a great effort to balance my job as a Data Scientist at Ford Motor Company and studies going on in parallel. Thanks to my loving wife, Mou for all the support!

Now without any further ado, you can read the two essays right below. These were written in 2018. So the contents here reflect that time.

Georgia Tech Logo

Career Objectives and Background Essay – "What has prepared you for this program?"

Please describe your background (academic and extracurricular) and experience, including research, teaching, industry, and other relevant information.

Coming from a Transportation Engineering background, I learned C++ in college. But only much later as a Graduate Research Assistant at UConn, I applied programming skills in Python to build a prototype to estimate route level travel time from GPS data. For this, I implemented a Hidden Markov Model-based map-matching algorithm using Kernel Ridge Regression and Viterbi algorithm. Taking Applied Statistics courses and achieving 4.3 out of 4 in the Neural Networks course at UConn prepared me to handle challenging mobility problems. I also taught myself Java, implemented data structures and algorithms in Java, and studied Complexity Theory using Princeton, MIT, and University of Arizona course lectures available online.

While working at Metropia as a Research Engineer, I had a deep exposure to software and web architecture. There I actually covered many roles: researched new algorithms, developed software, discussed backend architecture, set up and maintained AWS servers and databases, and created dashboards using SQL. I researched and developed an algorithm to predict delay at traffic intersections and integrated with the real-time routing API. Eventually, I also developed a parameter optimization tool in Python that worked across a few AWS servers simultaneously improving routing prediction by a few percentage points. I developed an API predicting El Paso-Juarez border crossing time using GPS data from our app users. It involved building web-crawlers, statistical modeling, Neural Net regression, and a real-time reporting dashboard.

Since I have joined Ford as a Data Scientist, I am leading end-to-end development and commercialization of a new analytic product. I developed the algorithms, and the data pipeline by exploiting QuadTree for geospatial sampling and multithreaded API calls to collect 2 million data for the prediction model. Currently, I am working with the team to scale it globally.

With my experience in applying Computer Science, I am well prepared for this program.

Statement of purpose

Please give a Statement of Purpose detailing your academic and research goals as well as career plans. Include your reasons for choosing the College of Computing as opposed to other programs and/or other universities.

In summer 2014, while I was in my previous Master's program in Civil Engineering at UConn, I was mostly coding in Python for my graduate research project and was reading papers from Computer Science (CS) journals frequently. Later during the summer of 2016, I was working at a startup, Metropia. By then, I was able to build many algorithms and software tools using my programming skills but I realized that I lacked many fundamental CS concepts which I have been teaching myself using online resources. Neither it is efficient, nor it leads to any university endorsement for my skills. So my conviction to obtain a formal masters in CS has been growing stronger with time since then.

On the other hand, it is not realistic to leave my full-time job for graduate school either. I heard about the OMSCS program at the beginning of 2018. Right at that time, I joined Ford Motor Company as a Data Scientist. At Ford, I realize that the tools I build can be better if I have a deeper understanding of CS concepts. Fortunately, my manager agreed and he encouraged me to pursue OMSCS and Ford would support me with tuition fees. Three current colleagues at Ford enrolled in OMSCS program provided great positive reviews about the program and mentioned it is a rigorous program. Given the highly respected top-ranked Graduate CS program at Georgia Tech and its pioneering faculties, I intend to work hard, learn the most, and apply it to my job and career.

From my background in Mobility and Transportation, I find mobility problems heavily rely on CS theory and techniques. For example, in my work, I used parallel programming, a code profiler to optimize memory at runtime, and also developed APIs. But I want to go deeper and learn better ways to do it. I would like to take the opportunity of curated courses at OMSCS. I have come quite far by teaching myself CS skills, and I would like to get the missing CS degree to accelerate forward.

My career plan for the next 8-10 years is divided into three phases. In the short term, while I am enrolled in the degree, I would like to leverage my coursework to develop better analytic software products on my job. My focus will be to learn as much as possible about software engineering and build scalable machine learning tools. Courses like Graduate Algorithms and Machine Learning will be useful for this purpose. Also, the system design knowledge from courses like Software Development Process and Software Architecture and Design will be useful as well. I intend to choose machine learning specialization for my degree. I believe my current and immediate career trajectory will have many projects to apply all the learnings from the degree.

For the next 1-3 years after graduation, I would like to be able to lead analytic software development projects with my technical leadership. With a degree from Georgia Tech, I envision that my skills will be valued, and I will be leading a technical team.

Finally, 3-5 years after my graduation, I would like to be able to architect full-stack analytic software. This step will need more time for me to get enough experience. At this phase, I believe I will have enough exposure to the industry to find new product ideas and be a product manager. I intend to obtain an MBA to develop skills for running a company. I imagine I will be either leading a team within a large company or establishing my own that will develop software for emerging autonomous vehicles.

With my current role at Ford, I see the emerging new autonomous technologies closely and my transportation engineering background will be immensely valuable in creating new software products for the new upcoming transportation ecosystem. In the future, I may stay at my current company at Ford or move to other companies like General Motors, Google, Uber or Lyft, or maybe start my own. But with the OMSCS degree from Georgia Tech, I will be better able to progress in my career trajectory as I have outlined above.

Best way to setup PYTHONPATH for crontab

· 2 min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

When setting up a crontab job in Linux machine, these essential steps are required for a successful system operation

  1. Update the cron file by adding the new script on schedule

  2. Check the frequency of the schedule. Such as for running at 7 minutes interval, use */7 * * * * python /path/to/script.py Or for running every hour at 7th minute, use 7 * * * * python /path/to/script.py

  3. Check the file permission for the script. If it is not executable, then make it executable ls -l /path/to/script.py

Then if the file is not executable by the user add the permission by chmod 755 /path/to/script.py (you may need to have sudo access if you are not the owner of the file. For changing ownership of a file or folder, use chown command on Linux variants)

  1. Check the file permission for the output site if the script produces some files. If the continuing folder or the output file does not have a write-access, ensure it is writable. Follow the similar process as in step#3. You may channel any print statements to a file such as 7 * * * * python /path/to/script.py /path/to/output/file.txt 2>&1

Or better yet, use Python logging library to create useful log files and show if the program ran correctly.

How to unzip and read gzipped JSON files from URL in Python

· One min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

The Problem

Sometimes we end up zipping JSON files and putting up somewhere on the interweb. Now we need to read it back from the HTTP server and parse the file using Python. For that situation, let us assume that the zipped JSON is located at this URL:
http://example.com/python_list_turned_into.json.gz. To read this file, we need to do the following

  • Fetch the file using urllib2
  • Add a header which will be used to detect the gzip format
  • Then open the file and read the compressed file
  • Next, uncompress the file using gzip library and load it!

That's it!