Data Team

Thank you for sending in your CV! You are about to start on a journey towards joining one of the most promising AI and machine learning startups in Israel today.
 
The following test should take you up to 4 hours and successful completion will allow you to pass on to the interview stage at Intelligo. The recommended time for each question is noted. Questions 1-3 should take 2 hours in total. Questions 4-7 are time limited to about 2 minutes each. The final Question is a data analysis task and it should take about 90 mins.
The total time taken is recorded by the system.
 
Please read the questions carefully. They contain all the information you need to complete the tasks successfully. If for some reason you feel certain information needed for you to give a complete answer is missing, you can make the relevant assumptions needed for you to proceed. Please remember to specify the assumptions you made along the way and the reasons you made them.
 
 
We suggest writing your answers on a local (word) document, in case you experience any technical issues during the test. 
Good luck and we look forward to meeting you!
Please enter below
Name:
Email:
Question 1 - (warm-up) – Manual matching (~40min)
 
Our research target is Edward Lampert, who is the chairman and CEO of Sears Holdings.
 
Here is an Excel file with six sheets, containing information gathered by an automatic research tool:
 
1. Person: personal information on the individual such as address and SSN (social security number, in a way like a תעודת זהות).
2. News: Information on media mentioning the research target
3. Officer: Information about the research target’s employment history
4. Twitter: Information on Twitter accounts located for the research target
5. Contributions: Information on political donations given by the research target
6. Legal: Information on lawsuits involving the research target
 
Since the results are automated, some of the results may be referring to another Edward Lampert.
 
Your task is to review each result and mark a 0 if you believe it is NOT referring to our research target, or a 1 if you determine that this information is regarding the correct target.
Please try to write a brief comment explaining why you decided on 1 or 0 in the borderline cases. The explanation is usually more important than the result so that we can understand your thought process.
 
You will re-upload the completed excel file on the next page.
Question 1 - (warm-up) – Manual matching (~40min)
 
Our research target is Edward Lampert, who is the chairman and CEO of Sears Holdings.
 
Here is an Excel file with six sheets, containing information gathered by an automatic research tool:
 
1. Person: personal information on the individual such as address and SSN (social security number, in a way like a תעודת זהות).
2. News: Information on media mentioning the research target
3. Officer: Information about the research target’s employment history
4. Twitter: Information on Twitter accounts located for the research target
5. Contributions: Information on political donations given by the research target
6. Legal: Information on lawsuits involving the research target
 
Since the results are automated, some of the results may be referring to another Edward Lampert.
 
Your task is to review each result and mark a 0 if you believe it is NOT referring to our research target, or a 1 if you determine that this information is regarding the correct target.
Please try to write a brief comment explaining why you decided on 1 or 0 in the borderline cases. The explanation is usually more important than the result so that we can understand your thought process.
 
You will re-upload the completed excel file on the next page.
Please upload the excel file with the completed 0s and 1s

Question 2a - Automatic Matching (~20min)

In order to build a machine that is automatically able to decide which results should be 1s (matched) and 0s (non-matched), we defined specific features to compare the data. 

One of the main features is responsible for comparing the subject name (e.g Edward Lampert) and the name that's located in the result.(‘resultName’)
In the following excel sheet (file here: nameComparingFeature ) you can see the name comparison and its associated similarity score for different results.

(For every result you can also find the manual analyst feedback one the result, but remember that Name-comparison is only one of the factors an analyst will take into consideration when labeling a result.)


  1. 2a.
    What is the method (the function) which generates the similarity score in the excel sheet? ('Feature Score') You can describe the calculation in your own words, or write a formula.

 

 

Question 2b (~15min)
 

 

A good comparison method will mimic the work methodology of an Analyst in the best manner.
(Think how your brain works when looking at the comparison of personal names and try to translate it into a function(s) )

  1. 2b.
    What are the disadvantages in the last method, as calculated in the excel sheet?
Question 2c (~20min)
 
2c.
 
Present a method which you think will perform better. A method doesn't necessarily have to create a single score (like in this example), but it can also be a series of scores (i.e a method can contain multiple and different functions). The machine will know to take the series of scores into consideration. As an example: one score for the first name, and one score for the last name. 

In the next slide you can re-upload the excel file with your method suggestion score(s), but it's not a must. You can describe here your method and its function(s). 
Please upload a table with your responses in Word or Excel format

You can re-upload the excel file with your method suggestion score(s), but it's not a must.

Question 3 – Understanding which information is obtainable through each source (~20min -> 10min per scenario)

Background

Assume you want to collect information on an individual using the automated machine.

 

Assume that the machine is able to keep only the following details about a person: 

  • Full name
  • Date of birth
  • Address
  • Employment
  • SSN

The machine can use those details for two purposes: 

  1. Search the sources and collect results.
  2. Decide whether a result is 1 or 0.

The machine works in the following manner ('run process'):

  1. It receives an input from the user and saves it.
  2. It searches a source using what it knows so far about the person (if it doesn't have the minimum input for the source, no results will be received).
  3. It decides if the results are 1 or 0 based on what the machine knows so far about the person.
  4. If results that received 1 contain unknown details about the person, the machine keeps the new details and use them for future searches and scoring.
  5. The machine repeats stages 2-4 until it finishes going over all of the sources. Each source can be searched only once!

 

Each source has a minimum input which has to be entered in order to get results:
Without this input, the data source cannot be used. 

  • Twitter - full name
  • Person - SSN OR (full name + employment)
  • News - full name
  • Officer - employment + (full name OR address)
  • Contributions - full name
  • Legal - full name + address

You are the designer of the machine, you can decide in what order it will search the sources. 
e.g :  1. Twitter , 2. Person, 3. Legal, etc

Your objective is to get as many correct results as possible in each run (through stages 1-5 'run process') of the machine. The order of the sources should be depends on the user input, which might be different in each run.

The sources you have at your disposal are the same as in the attached Edward Lampert excel document (Person, News, Officer, Twitter, Contributions and Legal).  

 

Your Task

In each of the following 3 cases, your machine will receive a different input.

 

For each of the following inputs (i.e. The only information you received initially about the target), explain in what order you would run the sources. 

 

Remember to take into account both the minimum input of each source and the machine’s automatic match score abilities. Fully explain your thought process and the reasons for your choices.

Question 3 – Understanding which information is obtainable through each source (~20min -> 10min per scenario)

Background

Assume you want to collect information on an individual using the automated machine.

 

Assume that the machine is able to keep only the following details about a person: 

  • Full name
  • Date of birth
  • Address
  • Employment
  • SSN

The machine can use those details for two purposes: 

  1. Search the sources and collect results.
  2. Decide whether a result is 1 or 0.

The machine works in the following manner ('run process'):

  1. It receives an input from the user and saves it.
  2. It searches a source using what it knows so far about the person (if it doesn't have the minimum input for the source, no results will be received).
  3. It decides if the results are 1 or 0 based on what the machine knows so far about the person.
  4. If results that received 1 contain unknown details about the person, the machine keeps the new details and use them for future searches and scoring.
  5. The machine repeats stages 2-4 until it finishes going over all of the sources. Each source can be searched only once!

 

Each source has a minimum input which has to be entered in order to get results:
Without this input, the data source cannot be used. 

  • Twitter - full name
  • Person - SSN OR (full name + employment)
  • News - full name
  • Officer - employment + (full name OR address)
  • Contributions - full name
  • Legal - full name + address

You are the designer of the machine, you can decide in what order it will search the sources. 
e.g :  1. Twitter , 2. Person, 3. Legal, etc

Your objective is to get as many correct results as possible in each run (through stages 1-5 'run process') of the machine. The order of the sources should be depends on the user input, which might be different in each run.

The sources you have at your disposal are the same as in the attached Edward Lampert excel document (Person, News, Officer, Twitter, Contributions and Legal).  

 

Your Task

In each of the following 3 cases, your machine will receive a different input.

 

For each of the following inputs (i.e. The only information you received initially about the target), explain in what order you would run the sources. 

 

Remember to take into account both the minimum input of each source and the machine’s automatic match score abilities. Fully explain your thought process and the reasons for your choices.
3a - The input is full name and address. In what order would you run the sources? (3min)
Order
Twitter (min input: full name)
Person - (min input: SSN OR (full name + employment))
News - (min input: full name)
Officer - (min input: employment + (full name OR address))
Contributions - (min input: full name)
Legal - (min input: full name + address)
Explain your choices for 3a:
3b - The input is full name and employment. In what order would you run the sources? (3min)
Order
Twitter (min input: full name)
Person - (min input: SSN OR (full name + employment))
News - (min input: full name)
Officer - (min input: employment + (full name OR address))
Contributions - (min input: full name)
Legal - (min input: full name + address)
Explain your choices for 3b:
The Following 4 questions are more general in nature. You have between 1 to 3 minutes to answer each one.
Each question has only one correct answer.
Question 4 (3min - timed)
 
Each analyst from data team A has completed 3/4 as many tasks as each analyst on data team B. If the A team has 4/5 as many analysts as B team, what fraction of all the tasks completed by both teams did B team complete?
1/2
2/5
3/5
4/5
5/8
Question 5 (3min - timed)
 
In a certain country, the unemployment rate among construction workers dropped from 16 percent on September 1, 1992, to 9 percent on September 1, 1996. If the number of all construction workers (employed and unmployed) was 20 percent greater on September 1, 1996, than on September 1, 1992, what was the approximate percent change in the number of unemployed construction workers over this period?
50% decrease
30% decrease
15% decrease
30% increase
55% increase
Question 6 (120 secs - timed)

Bill traveled for 3 hours.
We know 2 more statements about Bill:
 
1. He traveled a total of 120 miles.
2. He traveled half the distance at 30 miles per hour, and half the distance at 60 miles per hour.
 
If he did not stop along the way, what speed did Bill average on trip?
Statement (1) by itself is sufficient to answer the question, but statement (2) by itself is not
Statement (2) by itself is sufficient to answer the question, but statement (1) by itself is not
Statements (1) and (2) taken together are sufficient to answer the question, even though neither statement by itself is sufficient
Either statement by itself is sufficient to answer the question
Statements (1) and (2) taken together are not sufficient to answer the question, requiring more data pertaining to the problem
Question 7 (150 Secs - timed)
 
A sink contains exactly 12 liters of water. If water is drained from the sink until it holds exactly 6 liters of water less than the quantity drained away, how many liters of water were drained away?
 
 
2
3
4.5
6
9
A Break?

The next slide will start a 90min data analysis task. You can use your favourite tool in order to answer it (Excel, Python/R/Sas etc)

Move next when you are ready.
Comparing Models (max 90 min)

In the following zip, there are 3 csv files. Each file represents a different model (A,B,C) that was running in a different time.
The data collected for each one is:
- DataSourceId - The data source which the system pulled results from
- subjectId - The subject Identifier which the system was looking for 
- score - the system automatic match decision. (1 is for records that were matched, 0 otherwise)
- label - an analyst manual label score. The analyst checked each record and marked 1 if the record mentioned the subject, and 0 otherwise. 

You goal is to compare the models performance in order to decide which of the model performed better. 


Download link: compare_models_file


1. Describe in short the steps you made to measure and compare the models performance ?
2. Which model did you find as the best one? why?

In the next slide you can upload an Excel or R/Python script to show your process and/or figures. You can also upload a zip file if you want to share multiple files.
Please move to the next slide only when you've finished the task.
Please upload file with the completed 21Qs
You can upload an Excel or a zip file with R/Python script to show your process and/or figures. If you want to share multile files, please add all into a single zip file.

Please do not upload a script file directely (.py/.R/etc).
Script files can be uploaded only as part of a zip/rar files. 

Please move to the next slide only when you've finished the task.


***Once you press Finish below your test will be submitted. Good luck!
{"name":"Data Team", "url":"https://www.quiz-maker.com/QPREVIEW","txt":"Thank you for sending in your CV! You are about to start on a journey towards joining one of the most promising AI and machine learning startups in Israel today.   The following test should take you up to 2.5 hours and successful completion will allow you to pass on to the interview stage at Intelligo. The test is timed.   Good luck and we look forward to meeting you!, Question 1 - (warm-up) – Manual matching (~40min)   Our research target is Edward Lampert, who is the chairman and CEO of Sears Holdings.   Here is an Excel file with six sheets, containing information gathered by an automatic research tool:   1. Person: personal information on the individual such as address and SSN (social security number, in a way like a תעודת זהות). 2. News: Information on media mentioning the research target 3. Officer: Information about the research target’s employment history 4. Twitter: Information on Twitter accounts located for the research target 5. Contributions: Information on political donations given by the research target 6. Legal: Information on lawsuits involving the research target   Since the results are automated, some of the results may be referring to another Edward Lampert.   Your task is to review each result and mark a 0 if you believe it is NOT referring to our research target, or a 1 if you determine that this information is regarding the correct target. Please try to write a brief comment explaining why you decided on 1 or 0 in some of the cases. The explanation is usually more important than the result so that we can understand your thought process.   You will re-upload the completed excel file on the next page., Please upload the excel file with the completed 0s and 1s","img":"https://www.quiz-maker.com/3012/images/ogquiz.png"}
Powered by: Quiz Maker