WEBVTT

00:00.000 --> 00:08.000
We want to create categories for, let's say, west coasts and other.

00:08.000 --> 00:13.000
Of course, you can create whatever number of coasts or regions you want.

00:13.000 --> 00:16.000
That's up to you. We're not going to make it type out a whole lot here.

00:16.000 --> 00:20.000
Just for the sake of this example here.

00:20.000 --> 00:23.000
So this is,

00:23.000 --> 00:28.000
works, right? So working in words.

00:28.000 --> 00:31.000
And we've got the states.

00:31.000 --> 00:33.000
So.

00:33.000 --> 00:38.000
If I want to just watch what I'm doing and I want to put a state there.

00:38.000 --> 00:40.000
Just kind of compare.

00:40.000 --> 00:44.000
Next to this case.

00:44.000 --> 00:48.000
I want to make sure they go into the right case.

00:48.000 --> 00:53.000
In the case that ship state is.

00:53.000 --> 00:56.000
And you could say it's equal to something.

00:56.000 --> 01:01.000
But I'd rather use in so I can set one line or a certain thing.

01:01.000 --> 01:08.000
So I'm going to use when the ship states is in.

01:08.000 --> 01:11.000
And then I can say it's California.

01:11.000 --> 01:16.000
And I'm just going to copy all these here up here.

01:16.000 --> 01:18.000
Copy the one. I don't need the or.

01:18.000 --> 01:21.000
I'm going to select this.

01:21.000 --> 01:24.000
I don't know if you know there's about a lot of code editors when you highlight something.

01:24.000 --> 01:26.000
Let's say you just highlight these.

01:26.000 --> 01:29.000
You can hit quotes and it wraps it in quotes.

01:29.000 --> 01:32.000
You can select this and hit quote and it wraps it in quotes.

01:32.000 --> 01:35.000
So you don't have to type it.

01:35.000 --> 01:37.000
And then type it.

01:37.000 --> 01:40.000
You can simply select something.

01:40.000 --> 01:41.000
And when you get quote.

01:41.000 --> 01:43.000
It quotes it for you.

01:43.000 --> 01:46.000
A good coded code editor will do that.

01:46.000 --> 01:48.000
Bad editor.

01:48.000 --> 01:49.000
Maybe it does it.

01:49.000 --> 01:51.000
But Microsoft SQL Server does not.

01:51.000 --> 01:53.000
It's not smart.

01:53.000 --> 01:56.000
But it spends on your code editor.

01:56.000 --> 02:02.000
All right. So if I just do that.

02:02.000 --> 02:04.000
Oh, I didn't say what.

02:04.000 --> 02:05.000
So then.

02:05.000 --> 02:07.000
It is.

02:07.000 --> 02:13.000
And I say West Coast.

02:13.000 --> 02:14.000
All right.

02:14.000 --> 02:15.000
So we've got.

02:15.000 --> 02:17.000
Oregon.

02:17.000 --> 02:18.000
And it's find other guy.

02:18.000 --> 02:19.000
California.

02:19.000 --> 02:20.000
Right.

02:20.000 --> 02:21.000
So we got that.

02:21.000 --> 02:23.000
And you can do as many different codes as you want.

02:23.000 --> 02:27.000
If it's not that, we're just going to say it's other.

02:27.000 --> 02:29.000
You can go through type in lots of different ones.

02:29.000 --> 02:30.000
You want to do East Coast.

02:30.000 --> 02:32.000
I like if I want to copy this.

02:32.000 --> 02:33.000
Face it.

02:33.000 --> 02:35.000
All want East Coast.

02:35.000 --> 02:39.000
Start putting in some states.

02:39.000 --> 02:40.000
Whatever.

02:40.000 --> 02:41.000
Go down the line.

02:41.000 --> 02:45.000
Create as many different regions as you.

02:45.000 --> 02:47.000
And I can name this.

02:47.000 --> 02:53.000
Let this column be known as region.

02:53.000 --> 02:57.000
So I've got my different regions here.

02:57.000 --> 03:00.000
The purpose for it is maybe just the label.

03:00.000 --> 03:02.000
Now maybe you just want to label your stuff.

03:02.000 --> 03:06.000
But in this case, it's because I want to group them.

03:06.000 --> 03:11.000
So I want to group by my regions.

03:11.000 --> 03:13.000
And so I can only show people the region.

03:13.000 --> 03:16.000
I can't show people other columns.

03:16.000 --> 03:23.000
Except for I do need to add a calling for some sort of aggregate function.

03:23.000 --> 03:27.000
I want to know how many orders were shipped.

03:27.000 --> 03:30.000
And that's what I'm in the orders table.

03:30.000 --> 03:32.000
How many orders were shipped.

03:32.000 --> 03:36.000
So how do I do that as an aggregate function?

03:36.000 --> 03:40.000
And I know the number of words.

03:40.000 --> 03:43.000
So we have orders table.

03:43.000 --> 03:49.000
So we have things like sum in max, average, count.

03:49.000 --> 03:50.000
Counts.

03:50.000 --> 03:53.000
Because if I count rows, I'm counting orders.

03:53.000 --> 03:56.000
So I want to count.

03:56.000 --> 03:59.000
So now I know how many are broken down at D tree.

03:59.000 --> 04:02.000
If you just did a two regions.

04:02.000 --> 04:07.000
You should get 27 and 473.

04:07.000 --> 04:10.000
So that's just to show you that's the nice thing about doing.

04:10.000 --> 04:13.000
Comment that you can comment out of line to say,

04:13.000 --> 04:16.000
I want to keep it, but I'm not quite ready to delete it.

04:16.000 --> 04:19.000
And then you bring it back that it could change your code.

04:19.000 --> 04:23.000
So any line that is commented out is just ignored when that is right.

04:23.000 --> 04:25.000
So you can add little notes to yourself,

04:25.000 --> 04:27.000
or you could just comment out a whole line to say,

04:27.000 --> 04:29.000
I don't know, I don't want that right now.

04:29.000 --> 04:35.000
But I'm not quite ready to delete it yet.

04:35.000 --> 04:40.000
We can make a column to divide people into three groups.

04:40.000 --> 04:44.000
Based on when they created their accounts.

04:44.000 --> 04:50.000
So this would be users based on when they created their accounts.

04:50.000 --> 04:53.000
So got my users.

04:53.000 --> 04:56.000
There's a created at.

04:56.000 --> 05:02.000
So I want to create a new column for the created at.

05:02.000 --> 05:06.000
So I'm going to create a case.

05:06.000 --> 05:12.000
And in the case that that is 2019.

05:12.000 --> 05:13.000
So the year.

05:13.000 --> 05:17.000
So I want to extract the year from this.

05:17.000 --> 05:20.000
And if it's.

05:20.000 --> 05:23.000
2019 or earlier.

05:23.000 --> 05:28.000
Now, if I'm just saying it's equal to 2019 or equal to 2020,

05:28.000 --> 05:30.000
I could do a simplified case.

05:30.000 --> 05:33.000
But if I want to say it's 2019 or earlier,

05:33.000 --> 05:36.000
like maybe I don't know, maybe there are 2018, 2017.

05:36.000 --> 05:40.000
If I want to use that kind of less than or equal to.

05:40.000 --> 05:42.000
That's not a neat.

05:42.000 --> 05:43.000
That's not just a call, right?

05:43.000 --> 05:45.000
That's less than or equal to.

05:45.000 --> 05:50.000
So in that case, I need to do a search case where I move my.

05:50.000 --> 05:53.000
Kind of question part down into here.

05:53.000 --> 05:56.000
And I need just part of the date that I want.

05:56.000 --> 05:59.000
So date part because I don't want the whole dates.

05:59.000 --> 06:03.000
I just want the year part of this.

06:03.000 --> 06:07.000
From the created at.

06:07.000 --> 06:10.000
Created at.

06:10.000 --> 06:16.000
And when that is less than or equal to 2019.

06:16.000 --> 06:19.000
Then I wanted to say early.

06:19.000 --> 06:24.000
Those are early people.

06:24.000 --> 06:26.000
So I got a bunch of purpose.

06:26.000 --> 06:28.000
The rest are no because.

06:28.000 --> 06:31.000
You know, if I want to see also like just put the.

06:31.000 --> 06:34.000
Created at up here just to compare.

06:34.000 --> 06:37.000
So that's nice to just spot check yourself.

06:37.000 --> 06:38.000
Stiff test.

06:38.000 --> 06:40.000
Everything's probably good.

06:40.000 --> 06:41.000
Everything's seen good.

06:41.000 --> 06:43.000
Is your all 2019 is.

06:43.000 --> 06:45.000
As soon as they get to 2020.

06:45.000 --> 06:47.000
Okay.

06:47.000 --> 06:50.000
This one I can copy and paste.

06:50.000 --> 06:52.000
This could be 2020.

06:52.000 --> 06:55.000
But now it's equal to 2020.

06:55.000 --> 06:59.000
Those are my middle people.

06:59.000 --> 07:01.000
And for everybody else.

07:01.000 --> 07:05.000
Early.

07:05.000 --> 07:08.000
So now I got early middle.

07:08.000 --> 07:11.000
And then the catch all that we have at the end is just.

07:11.000 --> 07:15.000
For videos that sense that.

07:15.000 --> 07:17.000
So I've got now.

07:17.000 --> 07:19.000
A type of user.

07:19.000 --> 07:23.000
Right. So I got I'm going to call that user.

07:23.000 --> 07:26.000
And.

07:26.000 --> 07:31.000
The idea is I might want to know how many orders each group has made.

07:31.000 --> 07:36.000
Like are my early adopters ordering more are my lead adopters order ordering more.

07:36.000 --> 07:40.000
So the idea is that I could group by.

07:40.000 --> 07:42.000
That user type.

07:42.000 --> 07:47.000
But I can only then show people the user type so I can't show the created at anymore.

07:47.000 --> 07:49.000
And then.

07:49.000 --> 07:52.000
Just like we did before we're counting the number of.

07:52.000 --> 07:54.000
Oh wait.

07:54.000 --> 07:58.000
I want to know the number of orders.

07:58.000 --> 08:04.000
But we just figured out when users were creating their account.

08:04.000 --> 08:07.000
So if I had a count here.

08:07.000 --> 08:11.000
It's where you have to pay close attention to your question.

08:11.000 --> 08:13.000
In this query.

08:13.000 --> 08:16.000
What are these numbers telling me.

08:16.000 --> 08:19.000
What am I counting.

08:19.000 --> 08:22.000
I have a hundred users.

08:22.000 --> 08:25.000
And that adds up to 100.

08:25.000 --> 08:29.000
So it's pretty evenly spread between early middle and late.

08:29.000 --> 08:32.000
A little more towards the late right.

08:32.000 --> 08:36.000
That's the number of users in each group.

08:36.000 --> 08:42.000
But I want to know the number of orders made by each of those groups.

08:42.000 --> 08:46.000
So I needed to create those groups because I needed to divide users.

08:46.000 --> 08:48.000
And they created their accounts.

08:48.000 --> 08:52.000
But now I want to count orders, which is in another table.

08:52.000 --> 08:54.000
Now.

08:54.000 --> 09:02.000
Is there a way that I can deal with both user data and order data in the same query.

09:02.000 --> 09:03.000
Join.

09:03.000 --> 09:04.000
I can join them together.

09:04.000 --> 09:05.000
Right.

09:05.000 --> 09:07.000
Because when I want data into tables.

09:07.000 --> 09:10.000
I can put them together into one big table.

09:10.000 --> 09:12.000
I call it the megatube.

09:12.000 --> 09:14.000
It's not industry term.

09:14.000 --> 09:18.000
So I can take little tables and build them up into bigger tables.

09:18.000 --> 09:22.000
And if I look at my entity relationship diagram.

09:22.000 --> 09:27.000
Is there a way that I can join or connect users into their orders.

09:27.000 --> 09:28.000
Sure.

09:28.000 --> 09:29.000
I know.

09:29.000 --> 09:31.000
I know what user placed an order.

09:31.000 --> 09:34.000
I can join them based on a user ID.

09:34.000 --> 09:36.000
So right now.

09:36.000 --> 09:39.000
When you only have users.

09:39.000 --> 09:42.000
Every row is a user.

09:42.000 --> 09:46.000
If I then say, well, that's going to be you.

09:46.000 --> 09:50.000
And I'm going to join the orders table.

09:50.000 --> 09:56.000
On two columns that have equal type of data, like data.

09:56.000 --> 10:00.000
User ID, user ID.

10:00.000 --> 10:04.000
So in my user's table, it's called user ID.

10:04.000 --> 10:07.000
In my orders table, also called user ID.

10:07.000 --> 10:11.000
Same column name, just in two different tables.

10:12.000 --> 10:16.000
When I try to run it, it says a ambiguous.

10:16.000 --> 10:19.000
It wasn't ambiguous just a moment ago.

10:19.000 --> 10:24.000
What does a term ambiguous mean in this situation?

10:24.000 --> 10:28.000
We've seen this error before.

10:28.000 --> 10:33.000
Yeah, the same name is used in two tables.

10:33.000 --> 10:35.000
See, notice it says created at.

10:35.000 --> 10:38.000
Created at is in both tables.

10:39.000 --> 10:43.000
Because we know when the user created their accounts.

10:43.000 --> 10:47.000
And orders when they created the order.

10:47.000 --> 10:52.000
So the question is, which created at do I want?

10:52.000 --> 10:57.000
I want the user created at because I'm dividing users into groups.

10:57.000 --> 11:00.000
I don't care when the order was created.

11:00.000 --> 11:03.000
I care about when the user created their account.

11:03.000 --> 11:05.000
This was the user type, right?

11:05.000 --> 11:08.000
I was using the user table.

11:08.000 --> 11:11.000
So I just want to go back and add the user.

11:11.000 --> 11:13.000
Because it was confused.

11:13.000 --> 11:16.000
It's like, hey, do you want when orders were created?

11:16.000 --> 11:19.000
Or do you want when users were created?

11:19.000 --> 11:23.000
And now, if you remember, we had 500 orders.

11:23.000 --> 11:27.000
Sniff test says this, this seems right.

11:27.000 --> 11:28.000
Right?

11:28.000 --> 11:30.000
Adds up to 500.

11:30.000 --> 11:34.000
And now we, we've still divided people based on when they signed up.

11:34.000 --> 11:39.000
But I'm counting orders because when you join users to orders.

11:39.000 --> 11:43.000
You still have all the orders, but now you know who placed that.

11:43.000 --> 11:45.000
Because that's what joins do.

11:45.000 --> 11:49.000
Joins are not about just adding columns of user data.

11:49.000 --> 11:51.000
And order data.

11:51.000 --> 11:54.000
You're getting all the rows from those tables.

11:54.000 --> 11:58.000
And when you join users to orders with just a regular inner join,

11:58.000 --> 12:00.000
you're looking for matches.

12:00.000 --> 12:03.000
I don't care about users who never placed an order.

12:03.000 --> 12:06.000
I only care about users who did place an order.

12:06.000 --> 12:09.000
And it will match up for every single order.

12:09.000 --> 12:11.000
It's going to say who placed that order.

12:11.000 --> 12:15.000
So I will know who placed the order and what the order was.

12:15.000 --> 12:16.000
Right?

12:16.000 --> 12:18.000
I mean, not what was in the order because we didn't put line items in there.

12:18.000 --> 12:20.000
I don't need that.

12:20.000 --> 12:22.000
But I'll get a row for every single order.

12:22.000 --> 12:26.000
I know who placed the order.

12:26.000 --> 12:29.000
So now we're counting a much bigger data set.

12:29.000 --> 12:34.000
Because when you do joins, it's not just about how many columns you have.

12:34.000 --> 12:36.000
It also affects how many rows you have.

12:36.000 --> 12:41.000
Because if you're getting all the orders table with all those 500 orders.

12:41.000 --> 12:43.000
And you start with the underdusers.

12:43.000 --> 12:47.000
Well, those underdusers are spread across all those 500 orders.

12:47.000 --> 12:49.000
Just repeat it.

12:49.000 --> 12:51.000
But you're going to get all 500 orders.

12:51.000 --> 12:57.000
So when you're counting, you're now counting orders.

12:57.000 --> 13:02.000
If you wanted to see the join, we probably should have done the join first.

13:02.000 --> 13:05.000
Because remember the order of execution.

13:05.000 --> 13:11.000
If you want to see your data as long as possible to be able to envision what it's doing.

13:11.000 --> 13:15.000
If you write your code, if you can plan ahead.

13:15.000 --> 13:19.000
Do it in the order of execution to see it as much as possible.

13:19.000 --> 13:24.000
Because if we did the from and the join, you would have been able to see the join.

13:24.000 --> 13:29.000
Then we could have done the group by after that.

13:29.000 --> 13:31.000
But we would have seen the join happen.

13:31.000 --> 13:34.000
Now you can always go back and add the join just like I did.

13:34.000 --> 13:38.000
You just don't see like it didn't see the joint itself.

13:38.000 --> 13:41.000
You saw it clearly made it perfect.

13:41.000 --> 13:47.000
But you didn't see what was going on because the group by hides all that stuff.

13:47.000 --> 13:50.000
So if it's all seems kind of mysterious and you're like,

13:50.000 --> 13:52.000
I really wish I could have seen that joy.

13:52.000 --> 13:54.000
Do the joint first.

13:54.000 --> 13:58.000
Then come back in and do the group by.

13:58.000 --> 14:02.000
It might sound like that's such an obvious thing, but I can't tell you.

14:02.000 --> 14:06.000
When I was first learning SQL, I struggled with like, how do I see stuff?

14:06.000 --> 14:10.000
And I finally realized, hey, if I just do things in the order of execution,

14:10.000 --> 14:15.000
the whole point of this is the it's a funnel that each step filters more and more and more.

14:15.000 --> 14:18.000
And once I realized, hey, if I just follow that order,

14:18.000 --> 14:22.000
I'll be able to see things as long as possible.

14:22.000 --> 14:25.000
That's not an obvious thing necessarily that everybody figures out.

14:25.000 --> 14:31.000
I was happy once I finally figured that out because now I could understand what was going on behind the scenes.

14:31.000 --> 14:33.000
Step by step by step.

14:33.000 --> 14:36.000
That wasn't immediately obvious.

