The goal of this project is to make a model which will predict a customer's loan status given a number of attributes such as the current loan amount and whether the loan is a long term loan or a short term loan.

Import packages

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('LoansTrainingSet.csv')
df.head()
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (16) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
Out[1]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 10+ years Home Mortgage 33694.0 Debt Consolidation $584.03 12.3 41.0 10 0 6760 16056 0.0 0.0
1 00002c49-3a29-4bd4-8f67-c8f8fbc1048c 927b388d-2e01-423f-a8dc-f7e42d668f46 Fully Paid 3441 Short Term 734.0 4 years Home Mortgage 42269.0 other $1,106.04 26.3 NaN 17 0 6262 19149 0.0 0.0
2 00002d89-27f3-409b-aa76-90834f359a65 defce609-c631-447d-aad6-1270615e89c4 Fully Paid 21029 Short Term 747.0 10+ years Home Mortgage 90126.0 Debt Consolidation $1,321.85 28.8 NaN 5 0 20967 28335 0.0 0.0
3 00005222-b4d8-45a4-ad8c-186057e24233 070bcecb-aae7-4485-a26a-e0403e7bb6c5 Fully Paid 18743 Short Term 747.0 10+ years Own Home 38072.0 Debt Consolidation $751.92 26.2 NaN 9 0 22529 43915 0.0 0.0
4 0000757f-a121-41ed-b17b-162e76647c1f dde79588-12f0-4811-bab0-e2b07f633fcd Fully Paid 11731 Short Term 746.0 4 years Rent 50025.0 Debt Consolidation $355.18 11.5 NaN 12 0 17391 37081 0.0 0.0

Check for duplicated rows

In [2]:
df[df.duplicated()==True]
Out[2]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
14 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17961 Short Term 701.0 < 1 year Own Home 53053.0 Debt Consolidation $596.85 9.9 43.0 7 0 6810 22775 0.0 0.0
19 00035328-2636-4390-8322-5841de482d2b 28eb75ac-6a09-4057-b73f-726c72ebf933 Charged Off 11604 Short Term 729.0 6 years Home Mortgage 35315.0 Debt Consolidation $662.16 15.0 NaN 12 0 17092 20743 0.0 0.0
32 0006572c-9567-484d-b49b-bfe06536aa96 c1a7ba66-9f36-4f5c-86cd-1aa7ad68b954 Charged Off 14727 Short Term 740.0 3 years Home Mortgage 70690.0 Debt Consolidation $1,083.91 18.0 27.0 12 0 43375 225911 0.0 0.0
37 000795b0-d767-42bf-837a-80d5b78c1e50 b533c78c-3096-4914-acb1-5453c0ff76a4 Charged Off 19731 Long Term 623.0 3 years Rent 43846.0 Buy House $738.08 16.2 55.0 13 1 9031 17955 1.0 0.0
40 0008a65c-3054-4b5d-931d-8352021f813d 04513d99-d809-4eb6-b4bb-6abc767d861e Charged Off 7066 Short Term 741.0 10+ years Own Home 60564.0 Debt Consolidation $706.58 14.6 56.0 4 0 6229 9242 0.0 0.0
106 001b2e80-1220-40f4-87ad-63fc4814bdfe f7d50460-c0a5-45d0-ae25-891947b4c7b9 Charged Off 11316 Long Term 703.0 6 years Rent 80832.0 Debt Consolidation $714.02 17.8 71.0 5 1 3666 7376 1.0 0.0
118 001df440-3495-4ea7-b519-b7b85dd7ad0e c550dd85-dccb-4441-b2fe-a06e90bf48de Charged Off 22059 Long Term 679.0 10+ years Home Mortgage 200540.0 Debt Consolidation $578.23 24.1 38.0 10 0 17642 41904 0.0 0.0
131 0021a33d-62ce-44fa-9c8f-7b5396a1be81 9681f4f0-4d57-48e2-917d-a5f6d3accaab Charged Off 11945 Short Term NaN 5 years Rent NaN Debt Consolidation $675.21 18.6 NaN 8 0 5258 22278 0.0 0.0
157 0027a983-b2d4-4363-8191-2e5480a92549 0c8991e7-43fe-478b-94cc-ba1185f3eb05 Charged Off 10057 Short Term NaN 3 years Rent NaN Debt Consolidation $1,089.87 16.4 15.0 11 0 11615 14234 0.0 0.0
167 002a0be9-3c47-4471-bfac-130d33cdb038 e6dad351-ac76-4453-827a-7841ceaf0914 Charged Off 19173 Short Term 609.0 < 1 year Home Mortgage 80728.0 Other $905.50 19.6 14.0 5 0 12594 12619 0.0 0.0
182 002d9230-a4a2-4a53-b2e6-95df00866341 a96d08d2-cbb3-42d8-9718-909f1ccfc9ad Charged Off 9795 Short Term 687.0 6 years Rent 27496.0 other $788.22 6.6 40.0 4 0 2290 4626 0.0 0.0
201 00336793-40a5-448c-81a6-528bf04a04ba 7b182b4a-d3ed-419a-b889-83f44fff2203 Charged Off 7886 Short Term 741.0 5 years Rent 23854.0 Debt Consolidation $415.45 6.2 NaN 6 0 5394 6801 0.0 0.0
207 0033eb55-3702-4a17-8e68-ef95f3f57053 58a5de0c-b628-4730-a25f-5239ab65e191 Charged Off 2297 Short Term 708.0 5 years Rent 27564.0 Debt Consolidation $243.48 6.9 NaN 10 0 3859 12449 0.0 0.0
221 00393195-42fd-4b0f-a019-10cb0c86dce4 3fc2799b-87d2-4068-b4d9-abfba07be5be Charged Off 15958 Short Term 713.0 10+ years Own Home 48800.0 Debt Consolidation $727.93 17.0 NaN 9 1 11523 18146 0.0 0.0
228 003a346b-6ede-4179-8991-c63c4ef9af55 ecad702f-d39e-46c9-9418-ac291588bc2d Charged Off 15018 Long Term 732.0 9 years Home Mortgage 45054.0 Debt Consolidation $1,006.21 17.0 NaN 15 0 15460 28058 0.0 0.0
230 003a4f3b-7599-4b5a-88a6-f569bb8bc955 32c9ce22-5628-4c4f-8a30-3c1e7df3e817 Charged Off 17304 Short Term 657.0 5 years Own Home 46823.0 Debt Consolidation $1,069.13 13.0 NaN 12 0 16000 16394 0.0 0.0
272 0044a2d5-3ffa-4103-bc5a-d1c9b77e64ac 70d9d527-c3dc-45dc-9661-de97eb5ac94f Charged Off 16356 Short Term 709.0 10+ years Home Mortgage 60486.0 Debt Consolidation $1,048.42 20.4 36.0 8 1 11073 19563 1.0 0.0
307 004d4be5-c8b9-42cc-871c-313a58feb263 2937ddf0-c778-459f-b602-3e0a402f68a4 Charged Off 10799 Long Term 708.0 2 years Rent 29451.0 Debt Consolidation $912.98 24.1 NaN 12 0 14298 18545 0.0 0.0
315 004ee15f-da2a-4590-8f9a-ac139929f867 c5afc0ee-482b-441a-8c81-23165bc95eef Charged Off 12072 Short Term 743.0 < 1 year Home Mortgage 39420.0 Debt Consolidation $683.28 9.5 NaN 13 0 4096 22628 0.0 0.0
321 0051664e-e2f1-4f60-8ccb-f59fba657148 0ce8d48e-9584-4736-97ef-348e95a3e536 Charged Off 8891 Short Term 732.0 3 years Rent 61250.0 Debt Consolidation $1,260.73 12.9 NaN 12 0 8959 17397 0.0 0.0
341 0054b992-5d03-4624-83c3-b156ee89f393 b7693a5e-da84-441e-bad8-b0508ab95884 Charged Off 10747 Long Term 6600.0 6 years Rent 80352.0 Debt Consolidation $1,459.72 14.0 66.0 7 0 2480 2914 0.0 0.0
349 005610f4-3fcd-4344-baa1-c53111dea319 b283c93c-337f-4cb8-a3ea-2435ed5fb0c4 Charged Off 23312 Short Term 6760.0 8 years Rent 76853.0 Debt Consolidation $954.25 9.2 NaN 10 0 14038 18643 0.0 0.0
381 005fdf32-84ea-4e1c-9802-2d8d418c4655 d8076865-2835-40ce-9326-7b231464bb27 Charged Off 3020 Short Term 699.0 3 years Own Home 59389.0 Debt Consolidation $673.08 6.5 NaN 17 0 17159 28503 0.0 0.0
418 006a2c4f-30a0-43cc-ab8f-dd152a1c3c4b 6624426a-79bc-4d53-b32b-536a284cac5f Charged Off 35053 Short Term NaN 3 years Rent NaN Business Loan $809.13 13.3 NaN 25 0 5153 42585 0.0 0.0
420 006a5d3b-d802-4874-a13e-0df0ff838b5b 173f207d-8d77-406f-b596-3f641e42ff4f Charged Off 27359 Short Term 7460.0 10+ years Home Mortgage 390840.0 Debt Consolidation $5,243.77 27.3 NaN 22 0 257406 325008 0.0 0.0
432 006be99e-94e5-4c24-84af-c6106771a26c cc35b2b5-97b3-4257-b15f-7c66fe7d45f9 Charged Off 18336 Long Term 703.0 10+ years Home Mortgage 66398.0 Debt Consolidation $1,101.10 15.4 NaN 11 0 47925 86978 0.0 0.0
437 006d81cf-6ced-4840-9a33-9783fc02d841 e1fbde58-5f55-42ed-af5a-cafe9d7e22c9 Charged Off 7662 Long Term 726.0 10+ years Home Mortgage 26469.0 Debt Consolidation $467.63 19.8 79.0 6 0 4607 5339 0.0 0.0
443 006f090c-fbd5-466c-ae1e-0a2c1df1f31d 1d2a9045-5e06-43ff-865f-5e393d65873f Charged Off 11933 Long Term 722.0 8 years Home Mortgage 74580.0 Debt Consolidation $1,280.29 25.9 20.0 15 0 17883 31429 0.0 0.0
457 0074c307-71da-4daa-8c25-83751b831101 77bbd683-dd15-438f-a902-2dbb04375f45 Charged Off 3564 Short Term 712.0 8 years Rent 45545.0 Debt Consolidation $565.52 15.8 50.0 8 0 3763 5176 0.0 0.0
460 00750cae-1b87-45cd-9ce5-ee6e002bf2ad fea3d406-4070-45cf-932e-6e26c8666023 Charged Off 31709 Long Term NaN 2 years Rent NaN other $1,307.99 16.8 NaN 15 0 3624 27246 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
256417 ff6d32f8-d4f2-4a9e-a4e0-690781eaeb97 a9a46824-4f43-49c6-aaca-8b2f9509f7f0 Charged Off 12248 Short Term NaN 10+ years Rent NaN Debt Consolidation $772.36 17.1 NaN 6 0 6808 9796 0.0 0.0
256445 ff73bf26-fd77-4433-9246-987173784524 68d6ec31-a319-447b-9190-af208793f9f0 Charged Off 16133 Long Term 702.0 5 years Rent 60498.0 Debt Consolidation $630.69 16.0 NaN 6 0 14663 20281 0.0 0.0
256469 ff79d91e-661c-4da4-8ab4-acd0c73995a3 98558920-bb9b-493c-827e-b41e805bc705 Charged Off 4914 Short Term NaN NaN Rent NaN Debt Consolidation $250.61 18.0 NaN 6 1 5768 7077 1.0 0.0
256472 ff7a814f-a952-4950-9eea-7779b33e3a8d c8362588-cb64-4b17-804d-95ba7b461ee3 Charged Off 10093 Short Term NaN 8 years Home Mortgage NaN other $1,396.19 17.1 38.0 12 0 59702 64334 0.0 0.0
256474 ff7a8fe7-888e-49c3-9305-9c6e34e6f6f2 03c22e05-969a-4673-aa26-9621a355f458 Charged Off 24490 Long Term 739.0 1 year Home Mortgage 63674.0 Debt Consolidation $986.95 19.6 NaN 13 0 5416 24288 0.0 0.0
256476 ff7b269c-cacf-4440-a86b-ab64ebb0d436 ea18d4d2-cb2a-4043-839d-808ea3e7a11b Charged Off 10156 Short Term 746.0 4 years Home Mortgage 73123.0 Debt Consolidation $1,413.72 28.9 NaN 9 0 23791 31470 0.0 0.0
256502 ff7fce5a-3f95-4f03-9796-a9e60086e9c5 aa619302-49c8-472b-bf76-934a14f65254 Charged Off 15656 Long Term 725.0 8 years Rent 43054.0 Debt Consolidation $681.69 11.5 NaN 5 0 3632 6167 0.0 0.0
256512 ff81879a-8511-4b4a-a3ac-6bd51aaa6293 34707ab0-51dc-4cd8-a0c8-61b19bc45839 Charged Off 7859 Short Term 721.0 1 year Rent 29472.0 Debt Consolidation $397.87 11.9 72.0 16 0 3350 12454 0.0 0.0
256523 ff847f57-6670-4b8e-b0ac-aa6a9b2efa69 647be725-9aff-4d7d-8b4f-7c8a6d7a2f91 Charged Off 16179 Short Term 747.0 7 years Home Mortgage 96064.0 Debt Consolidation $976.65 20.2 NaN 12 0 6065 55644 0.0 0.0
256563 ff8f89a7-e168-4bcb-b678-bf11ace43813 4c516175-c4b0-4e4a-8ce1-b460bfa4c08f Charged Off 10045 Short Term NaN 1 year Home Mortgage NaN Debt Consolidation $738.31 8.1 NaN 11 0 6479 8436 0.0 0.0
256566 ff8fc7eb-0237-47b2-bfed-8bf1151f13df 35f5f178-e329-4e90-b565-8203ddfd1992 Charged Off 15781 Short Term NaN 1 year Rent NaN Debt Consolidation $1,307.83 11.2 NaN 17 0 18653 25835 0.0 0.0
256575 ff921f7d-1ef0-4f12-9a86-d6cc8fce3a3d 2278a185-7ec7-4769-a8af-205bf068c99e Charged Off 30579 Long Term NaN 5 years Home Mortgage NaN Debt Consolidation $1,048.23 23.5 NaN 9 0 19244 82950 0.0 0.0
256589 ff946889-eac4-4cb1-b245-7cf6d76c1a5e 4973925c-a4e0-40a7-aa21-98caae67949c Charged Off 13834 Long Term 699.0 8 years Rent 39384.0 Debt Consolidation $347.89 13.5 16.0 19 0 7334 26008 0.0 0.0
256611 ff9b0800-2363-4dbb-88a9-8a7b51f24a75 472c643f-e992-4a2f-b705-8357c78afdab Charged Off 14076 Short Term 712.0 2 years Rent 53688.0 Debt Consolidation $760.59 12.5 NaN 6 0 6343 9049 0.0 0.0
256652 ffa5b141-790f-4547-a9e0-f9ba3d44e405 edeb8731-845e-4f9e-8ae7-cdf7aa262dac Charged Off 24700 Short Term 715.0 3 years Own Home 138320.0 other $1,613.73 17.0 NaN 9 0 5305 19646 0.0 0.0
256690 ffb05285-380a-4aa8-8180-b55f7aac388b f82e4d81-36b3-40d6-942d-3e701f0e6650 Charged Off 12047 Short Term 744.0 2 years Home Mortgage 44172.0 Debt Consolidation $585.27 14.8 NaN 5 0 8708 9839 0.0 0.0
256723 ffb844e3-73ef-451d-b084-06dddaf8bd56 c7c1469e-3e56-4c9d-9eb6-cc94ff4f6c82 Charged Off 5858 Short Term 740.0 NaN Rent 29624.0 Debt Consolidation $419.68 16.7 74.0 13 1 9339 15335 1.0 0.0
256777 ffc6dd5c-ace9-41b0-bec3-4e5ae8567b49 f27c9d87-5e7a-4527-99f4-6e8de41e9a8c Charged Off 7855 Short Term 735.0 3 years Home Mortgage 52190.0 Debt Consolidation $1,365.64 15.9 NaN 16 0 18617 25329 0.0 0.0
256815 ffd081d8-0ce7-41cf-8318-baccaabf681a 011bbf15-53a7-4962-a5ee-f2276a0d55f2 Charged Off 4780 Short Term 748.0 3 years Rent 27885.0 Debt Consolidation $418.28 13.9 80.0 12 0 7222 32829 0.0 0.0
256825 ffd2c305-c78e-4dc2-b27c-2e46281020de 4038da05-5677-4535-a00b-1322ff9b0060 Charged Off 5553 Short Term 741.0 10+ years Rent 71395.0 Other $1,181.00 15.5 NaN 11 0 17761 24099 0.0 0.0
256844 ffda0100-49f8-4205-a853-72fd7f709328 ac055546-80ce-4f32-9d32-b2c0ca132d33 Charged Off 24978 Long Term 6890.0 5 years Home Mortgage 144870.0 Debt Consolidation $1,943.67 15.9 34.0 10 2 5563 9287 1.0 0.0
256857 ffdfbaba-e39f-46bd-b6b0-2e71f29d0f3d a814b7cf-cac0-44f8-bc33-169a107eee0a Charged Off 4893 Short Term 729.0 NaN Home Mortgage 47951.0 Debt Consolidation $439.56 15.0 NaN 5 0 6272 9784 0.0 0.0
256859 ffdfd7b0-bfa9-4b5f-bccf-cef97575011b 2e59e451-338e-45c3-8488-3656bc145506 Charged Off 14855 Long Term NaN 4 years Own Home NaN Debt Consolidation $728.65 10.9 28.0 22 0 15825 28411 0.0 0.0
256869 ffe22ad2-1f4c-4ad8-84b4-4c2ed333dcf9 e6bdcc35-8f9c-4691-bbcc-fb35cb489d10 Charged Off 13682 Long Term 6600.0 1 year Rent 39092.0 Debt Consolidation $616.68 9.0 NaN 9 0 4464 10455 0.0 0.0
256875 ffe33e62-83b6-4bc2-9ec7-7f865621542e 307dc053-1357-40ac-a394-ad916ecca8ca Charged Off 1480 Short Term 7350.0 4 years Rent 34528.0 Debt Consolidation $417.21 6.5 NaN 13 0 2792 6997 0.0 0.0
256892 ffe9a992-ed0a-4a80-bb8e-91e8fd778bec 53fbf0c5-0852-47a8-9860-6418fe21dd56 Charged Off 10784 Short Term 721.0 < 1 year Rent 24581.0 Debt Consolidation $559.21 14.8 NaN 9 0 18710 20674 0.0 0.0
256917 ffef32fb-8f16-496f-88fa-08929936267d 507f6a07-19f9-4e73-9064-046c445c45b4 Charged Off 18331 Short Term 733.0 3 years Own Home 101840.0 Debt Consolidation $3,190.98 18.3 NaN 19 0 14696 20874 0.0 0.0
256969 fffc4837-d6f7-48c7-bc61-78422af92eb6 01511702-6e9f-490e-9fef-0a9dbf1c0e20 Charged Off 10782 Short Term 738.0 5 years Rent 36787.0 Debt Consolidation $732.67 9.5 NaN 8 0 13415 16440 0.0 0.0
256972 fffd5218-faf0-4b82-becf-11664fa064a3 301954fb-2381-4d68-a3b0-478870cd49f3 Charged Off 5968 Short Term 746.0 8 years Own Home 79568.0 Debt Consolidation $1,412.33 22.1 63.0 10 1 4237 0 0.0 0.0
256976 fffe5140-8725-4fd1-b3bf-9e59cb7834eb df856c29-ce44-4df7-94a6-bd758a1c1332 Charged Off 34710 Short Term 700.0 9 years Rent 148755.0 Business Loan $47.11 34.0 1.0 3 0 2590 37004 0.0 0.0

16610 rows × 19 columns

In [3]:
df[df['Loan ID']=='00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12']
Out[3]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
12 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17980 Short Term NaN < 1 year Own Home NaN Debt Consolidation $597.50 9.9 43.0 7 0 6817 22800 0.0 0.0
13 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17961 Short Term 701.0 < 1 year Own Home 53053.0 Debt Consolidation $596.85 9.9 43.0 7 0 6810 22775 0.0 0.0
14 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17961 Short Term 701.0 < 1 year Own Home 53053.0 Debt Consolidation $596.85 9.9 43.0 7 0 6810 22775 0.0 0.0
15 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17980 Short Term 701.0 < 1 year Own Home 53112.0 Debt Consolidation $597.50 9.9 43.0 7 0 6817 22800 0.0 0.0
In [4]:
no_dup = df.copy()

Remove duplicate rows

In [5]:
no_dup.drop_duplicates(keep = 'first',inplace = True)
In [6]:
no_dup[no_dup['Loan ID']=='00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12']
Out[6]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
12 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17980 Short Term NaN < 1 year Own Home NaN Debt Consolidation $597.50 9.9 43.0 7 0 6817 22800 0.0 0.0
13 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17961 Short Term 701.0 < 1 year Own Home 53053.0 Debt Consolidation $596.85 9.9 43.0 7 0 6810 22775 0.0 0.0
15 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17980 Short Term 701.0 < 1 year Own Home 53112.0 Debt Consolidation $597.50 9.9 43.0 7 0 6817 22800 0.0 0.0
In [7]:
no_dup[no_dup['Loan ID'].duplicated()==True]
Out[7]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
7 0000afa6-8902-4f8f-b870-25a8fdad0aeb e49c1a82-a0f7-45e8-9f46-2f75c43f9fbc Charged Off 24613 Long Term NaN 6 years Rent NaN Business Loan $542.29 17.6 73.0 7 0 14123 16954 0.0 0.0
13 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17961 Short Term 701.0 < 1 year Own Home 53053.0 Debt Consolidation $596.85 9.9 43.0 7 0 6810 22775 0.0 0.0
15 00029f9f-0cc5-4d4e-aabc-ea4a7fe74e12 afbc2fa3-3bad-4d48-b691-829aed78bad5 Charged Off 17980 Short Term 701.0 < 1 year Own Home 53112.0 Debt Consolidation $597.50 9.9 43.0 7 0 6817 22800 0.0 0.0
21 00038a08-f058-4add-a8ed-497b91672a9e 727bb429-dfa1-41c4-a347-23230e23949f Charged Off 16929 Long Term 734.0 3 years Home Mortgage 40308.0 Debt Consolidation $745.70 14.9 NaN 10 0 28513 101110 0.0 0.0
47 000a08fa-e669-46dc-9a96-9da6b96570a0 4b7782a6-b84c-4dd9-ac62-f927a68eeaa9 Charged Off 21160 Long Term 636.0 5 years Home Mortgage 77770.0 Debt Consolidation $1,290.98 20.8 NaN 28 0 28175 51227 0.0 0.0
56 000c0c27-1961-4336-b760-a300e1bce9b0 881edf98-387c-4eef-ab71-317a7a5e70c1 Charged Off 6463 Short Term NaN 1 year Own Home NaN Debt Consolidation $633.29 19.9 74.0 9 0 8025 24317 0.0 0.0
72 000f1140-ef9f-42f9-9898-f9b3162bf0da 442c2d9b-3d95-44a9-bcb4-0dc80a941e3c Charged Off 6007 Short Term NaN 10+ years Home Mortgage NaN Debt Consolidation $538.15 23.6 13.0 6 0 5077 7811 0.0 0.0
74 000f5c35-dc3e-415f-9c22-e355ef96d78b 0ecf29b6-93eb-4c13-b853-a9cbdfc7f85b Charged Off 12899 Long Term 667.0 2 years Rent 133947.0 Debt Consolidation $1,931.07 18.4 7.0 16 0 39417 90199 0.0 0.0
80 001227a1-4490-47c6-ada7-20faf636e9a3 3d4a8985-0b72-4084-8b20-f441d9868483 Charged Off 5060 Short Term 727.0 < 1 year Home Mortgage 62738.0 Other $711.03 20.4 22.0 6 0 10777 12041 0.0 0.0
85 00158423-d7a4-4fef-8e0d-ad8de93e8ba1 11e878dc-0982-4c6b-a32a-ef5fca6ed6f4 Charged Off 11546 Long Term 707.0 9 years Rent 33551.0 Debt Consolidation $522.84 22.7 NaN 17 0 12113 15630 0.0 0.0
89 00163613-1360-4e43-8fbc-7818032510b7 2eba693e-f810-459d-8df7-9a2376138466 Charged Off 5041 Short Term 737.0 2 years Rent 51147.0 Debt Consolidation $613.77 10.9 NaN 11 0 3644 4428 0.0 0.0
113 001ce813-522a-4a34-bbcc-21eaeae79aea cc73a15c-7ad4-4101-a2af-57306c335285 Charged Off 15191 Long Term NaN 7 years Rent NaN Debt Consolidation $385.07 14.0 NaN 11 0 15655 24810 0.0 0.0
114 001ce813-522a-4a34-bbcc-21eaeae79aea cc73a15c-7ad4-4101-a2af-57306c335285 Charged Off 15191 Long Term 706.0 7 years Rent 44863.0 Debt Consolidation $385.07 14.0 NaN 11 0 15655 24810 0.0 0.0
115 001ce813-522a-4a34-bbcc-21eaeae79aea cc73a15c-7ad4-4101-a2af-57306c335285 Charged Off 15306 Long Term 706.0 7 years Rent 45204.0 Debt Consolidation $388.00 14.0 NaN 11 0 15774 24999 0.0 0.0
125 00201656-d70d-49cf-b5b0-118280383288 0d419e30-7336-4e52-980d-8c17365acc67 Charged Off 9663 Short Term 743.0 2 years Home Mortgage 42277.0 Debt Consolidation $806.79 13.3 81.0 14 0 11990 16008 0.0 0.0
127 00205bb2-b17c-4001-84ac-c6d928e4eaa6 45aa9267-04f2-49b5-8991-7bb1ffc8f669 Charged Off 14834 Short Term 741.0 3 years Home Mortgage 98890.0 Debt Consolidation $2,398.08 16.0 12.0 8 0 5810 5736 0.0 0.0
134 00221528-22cc-422d-ace3-71a4c5a43e59 79791672-07e1-423e-bc3e-beecee96e4a6 Charged Off 3278 Short Term 733.0 NaN Rent 39948.0 Medical Bills $581.90 21.5 78.0 10 0 10392 14042 0.0 0.0
151 0026a1b5-8a22-478d-a6fe-5931fa8ed673 88e7f72d-e0f0-423b-86bc-0b05f023ca7e Charged Off 11017 Short Term NaN 2 years Rent NaN Debt Consolidation $1,280.23 11.7 NaN 9 0 7140 18788 0.0 0.0
152 0026a1b5-8a22-478d-a6fe-5931fa8ed673 88e7f72d-e0f0-423b-86bc-0b05f023ca7e Charged Off 11017 Short Term 740.0 2 years Rent 61206.0 Debt Consolidation $1,280.23 11.7 NaN 9 0 7140 18788 0.0 0.0
162 00291dfe-e40a-4791-acc8-a322ee3d9b72 835f15b4-ccea-46fb-9a1f-1ccda42ae737 Charged Off 6017 Short Term NaN 4 years Rent NaN other $448.63 8.9 NaN 12 0 7263 19736 0.0 0.0
171 002ae016-b028-483d-91fa-ab351bd326ce 2bd8d9b0-989f-40ae-be5a-9bee5cf645f5 Charged Off 16122 Long Term 699.0 10+ years Rent 45342.0 Debt Consolidation $1,031.53 20.4 52.0 16 1 7780 17327 0.0 0.0
176 002bd679-cca8-4637-9e21-f07117547ade ce4134c2-3656-4b45-8a98-7c90ae95ef2e Charged Off 6840 Short Term NaN < 1 year Rent NaN Buy House $2,208.06 26.4 60.0 20 0 7983 10951 0.0 0.0
204 0033afa0-9ee0-4b80-8583-84fd5df32cdd cdfc5fc0-0107-4770-b66d-dcfac1f08a38 Charged Off 9931 Long Term 700.0 NaN Rent 34560.0 other $743.04 23.0 NaN 9 0 3711 12011 0.0 0.0
226 003a3241-adbe-4337-b73a-5085c6834a74 08117601-9319-41e3-9fa6-6da917c70fa5 Charged Off 7253 Short Term 744.0 < 1 year Rent 45329.0 Debt Consolidation $30.22 15.9 0.0 4 0 3356 10968 NaN 0.0
234 003b2a04-c602-4685-bbe1-b049c6d208c8 c281b763-fd5e-4fe1-a294-27965ffba250 Charged Off 32889 Short Term 664.0 2 years Rent 188811.0 Home Improvements $2,722.02 8.0 NaN 8 0 3304 5715 0.0 0.0
252 0041f6e0-7b30-45db-b796-44fe0f5defa8 c9046850-92e6-4024-8a43-a7bdab580065 Charged Off 1530 Short Term 672.0 8 years Rent 43856.0 other $1,180.45 16.0 NaN 11 0 12135 17638 0.0 0.0
261 00430797-c7fa-4cdc-b1f5-1f777de86d4a e39d2d77-a389-44b9-a0bd-82f112bf3777 Charged Off 29061 Long Term 700.0 5 years Rent 66281.0 Debt Consolidation $1,231.72 14.5 NaN 13 0 37839 59402 0.0 0.0
263 00430b50-f36d-4abd-9e96-29998ee4150a 29ebed6e-e0d7-444b-8f9d-af6727d6b5c0 Charged Off 14423 Short Term 719.0 2 years Rent 34354.0 Debt Consolidation $755.78 14.5 NaN 22 0 14608 16562 0.0 0.0
270 0044a2d5-3ffa-4103-bc5a-d1c9b77e64ac 70d9d527-c3dc-45dc-9661-de97eb5ac94f Charged Off 16381 Short Term 709.0 10+ years Home Mortgage 60576.0 Debt Consolidation $1,049.98 20.4 36.0 8 1 11089 19592 1.0 0.0
271 0044a2d5-3ffa-4103-bc5a-d1c9b77e64ac 70d9d527-c3dc-45dc-9661-de97eb5ac94f Charged Off 16356 Short Term 709.0 10+ years Home Mortgage 60486.0 Debt Consolidation $1,048.42 20.4 36.0 8 1 11073 19563 1.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
256735 ffbc24f2-d57b-4196-9787-78016f0e435a c01d9949-9ec5-429d-8064-e007e0b20c67 Charged Off 20723 Long Term 695.0 10+ years Home Mortgage 52785.0 Debt Consolidation $967.73 28.3 NaN 11 0 14730 17391 0.0 0.0
256742 ffbe5e46-dd9b-4a5e-937a-cb173c667cfa 3bebc2f0-8618-41b4-bd6c-b98bc48f096a Charged Off 30000 Long Term 686.0 10+ years Own Home 89109.0 Debt Consolidation $2,398.52 20.4 16.0 8 0 41401 48029 0.0 0.0
256747 ffbf871d-5f1d-49e2-b5f7-f9a0995e5089 62a7273b-6bd6-44b0-9b3a-327754b063ec Charged Off 24502 Long Term 718.0 < 1 year Home Mortgage 98009.0 Debt Consolidation $1,462.79 19.6 NaN 4 0 25823 28283 0.0 0.0
256750 ffc07baf-5656-49df-80a6-d753a7a71bcf fa57bad2-238f-4172-afa5-2fae4deff911 Charged Off 10225 Long Term 717.0 10+ years Home Mortgage 86913.0 Home Improvements $811.18 21.2 18.0 7 0 5502 6443 0.0 0.0
256760 ffc3506e-4f06-4233-abb2-85efff9b412b bd1310e2-e9e4-471f-bf8d-1f3e72e010d5 Charged Off 18580 Long Term 723.0 7 years Rent 47917.0 Debt Consolidation $690.81 29.5 5.0 12 0 15800 30620 0.0 0.0
256765 ffc41c13-790b-43c6-8aa8-b4bb96ec53ce 313e03f7-487c-4e66-9c65-e2fbd625baf1 Charged Off 30129 Long Term 690.0 10+ years Own Home 75323.0 Debt Consolidation $1,060.79 13.5 NaN 15 0 22114 37930 0.0 0.0
256781 ffc782cf-549a-4b7b-b930-c5da2191154d 899532d0-ae51-49b7-ba08-b66ead574374 Charged Off 8110 Short Term 727.0 1 year Rent 24331.0 Debt Consolidation $490.68 9.4 NaN 5 0 9817 17752 0.0 0.0
256793 ffcaa10f-5560-4737-b72a-8fb4c91d63c9 13d36ee8-a841-43d5-be8e-60ce1e27bc22 Charged Off 7888 Short Term 712.0 10+ years Own Home 59160.0 other $739.50 20.8 NaN 14 2 5464 7784 2.0 0.0
256801 ffcd5c4c-c146-450e-b535-3f8fee1bbaf4 27019cfd-28a8-4a94-8a2a-33bba21e37e9 Charged Off 20012 Short Term 719.0 2 years Rent 82705.0 Debt Consolidation $1,026.92 19.4 NaN 11 0 36471 80866 0.0 0.0
256804 ffce37ae-4877-4316-b985-ca66e9942861 93330384-1e1d-4f78-8d33-766b40040231 Charged Off 10605 Short Term NaN 10+ years Home Mortgage NaN Debt Consolidation $1,164.08 14.5 NaN 16 0 18313 21750 0.0 0.0
256807 ffce7776-d6f5-43d7-8a51-766ddad34b8a df8ba181-9a11-4f13-8fc0-2af26fc1fb48 Charged Off 1589 Short Term 726.0 1 year Rent 64167.0 Debt Consolidation $320.30 15.2 NaN 4 0 2745 6460 0.0 0.0
256822 ffd1aa2b-4231-4cae-82fc-9f182e9471b4 3e1dd868-496c-4893-b323-3e908163aa6e Charged Off 17509 Long Term 676.0 2 years Rent 37392.0 Debt Consolidation $709.52 18.5 NaN 9 1 22965 23748 1.0 0.0
256864 ffe07979-874c-4f31-a8c3-550acaa6922e b36cdf08-5a97-4073-b4da-abeb56f9d0fa Charged Off 11569 Short Term 739.0 NaN Home Mortgage 50427.0 Debt Consolidation $748.00 11.0 71.0 10 0 6587 10292 0.0 0.0
256879 ffe48901-0341-436b-bc10-8bb50b0e2b7f 1f942698-0001-4676-b3f6-8b21bd7ca191 Charged Off 15352 Short Term 653.0 8 years Rent 300030.0 Business Loan $3,200.32 13.7 NaN 9 2 6798 8898 0.0 2.0
256894 ffea733e-2ec3-45cd-ba92-ccf97037f5e6 869118c0-5868-4acc-8bd2-7ccfc8695cc3 Charged Off 35025 Long Term 652.0 6 years Home Mortgage 78055.0 Home Improvements $780.55 10.8 NaN 9 0 14180 19425 0.0 0.0
256901 ffebcaee-5c82-4557-87dd-01f4b8ffc932 6a8860b3-b2eb-49c5-942d-2ef276baee92 Charged Off 20260 Long Term 718.0 9 years Home Mortgage 65845.0 Debt Consolidation $905.37 11.6 NaN 18 0 31082 41005 0.0 0.0
256903 ffebec60-7520-43d6-a392-1f5c30b6d07a 9b2098ea-7ac4-4479-a524-ce208004c70f Charged Off 16201 Long Term 699.0 NaN Own Home 35221.0 Debt Consolidation $1,091.84 32.9 NaN 19 0 31101 88104 0.0 0.0
256911 ffedd0f5-9ce5-4136-a72d-78e7310deea0 64278634-4e52-4b4d-beb7-d90f533a77a4 Charged Off 12089 Short Term 712.0 4 years Rent 110814.0 Debt Consolidation $1,708.38 11.4 25.0 10 0 6925 9978 0.0 0.0
256924 fff08eca-26b5-4d02-b618-c1cf46736896 09106d18-88cd-401a-ab07-9c93599138c6 Charged Off 24595 Long Term 669.0 1 year Home Mortgage 59028.0 Debt Consolidation $855.91 21.0 21.0 10 0 17282 18040 0.0 0.0
256928 fff142bd-1430-4bb9-9eac-a5b45436a5b4 819e9ff5-f8d4-4c6f-b5c1-9d274b93f159 Charged Off 24658 Long Term NaN 9 years Rent NaN Debt Consolidation $1,060.27 20.1 21.0 12 0 3718 18317 0.0 0.0
256929 fff142bd-1430-4bb9-9eac-a5b45436a5b4 819e9ff5-f8d4-4c6f-b5c1-9d274b93f159 Charged Off 24503 Long Term 696.0 9 years Rent 98010.0 Debt Consolidation $1,053.61 20.1 21.0 12 0 3695 18201 0.0 0.0
256930 fff142bd-1430-4bb9-9eac-a5b45436a5b4 819e9ff5-f8d4-4c6f-b5c1-9d274b93f159 Charged Off 24658 Long Term 696.0 9 years Rent 98630.0 Debt Consolidation $1,060.27 20.1 21.0 12 0 3718 18317 0.0 0.0
256933 fff1c0cc-49ce-47f6-a0c1-e74a7d43fbe1 cb142142-3ff3-4ffe-87ed-4c0dd9fabb67 Charged Off 9681 Short Term 732.0 8 years Home Mortgage 33759.0 Other $953.68 16.0 NaN 17 0 24900 29857 0.0 0.0
256937 fff2779e-9fe7-4db5-b566-48f43b58aa68 7ddd16c2-bf65-4b01-8b48-bbda8793c206 Charged Off 9976 Short Term 733.0 1 year Home Mortgage 124700.0 Debt Consolidation $1,340.53 10.4 16.0 16 0 6539 17074 0.0 0.0
256947 fff5f918-967f-4cae-b8e8-0ec41b9d0a69 92093f79-1142-496c-8aef-0bc152ed8dd9 Charged Off 20658 Short Term 745.0 10+ years Rent 45963.0 Debt Consolidation $792.86 17.4 NaN 10 0 13192 36143 0.0 0.0
256949 fff61503-051a-4bb8-8342-287939fa738a b36620f7-d086-45c4-bfdc-b14c2187d0c6 Charged Off 20492 Short Term 722.0 5 years Rent 52511.0 Debt Consolidation $1,163.99 19.0 NaN 12 0 18732 24391 0.0 0.0
256966 fffc297d-74f0-4d42-ad90-cdea2bd9aac0 b8cae6f0-6c9b-4982-bef2-e3cdf48add59 Charged Off 11524 Short Term 725.0 6 years Home Mortgage 50990.0 Debt Consolidation $790.35 21.4 NaN 23 0 12818 19160 0.0 0.0
256974 fffe36ec-97a6-460a-a236-640b7e0669f0 6520de3e-5961-47b1-bced-6a8132acb5cc Charged Off 10575 Long Term 722.0 < 1 year Rent 30854.0 Debt Consolidation $707.08 14.8 NaN 18 0 20183 23496 0.0 0.0
256978 fffe84cf-67a1-468d-ab7b-cf06a9c4ee14 460c6cf5-4d12-4e3c-a3d0-46dd67c78cce Charged Off 11953 Short Term 717.0 10+ years Home Mortgage 39844.0 Debt Consolidation $982.82 11.7 NaN 9 1 4176 4783 1.0 0.0
256982 ffffcb2e-e48e-4d2c-a0d6-ed6bce5bfdbe 971a6682-183b-4a52-8bce-1d3429ade295 Charged Off 12116 Short Term 746.0 9 years Home Mortgage 52504.0 Debt Consolidation $297.96 15.1 82.0 8 0 3315 20090 0.0 0.0

24674 rows × 19 columns

In [8]:
df['Loan Status'].unique()
Out[8]:
array(['Fully Paid', 'Charged Off'], dtype=object)
In [9]:
no_dup[no_dup['Loan ID']=='0000afa6-8902-4f8f-b870-25a8fdad0aeb']
Out[9]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
6 0000afa6-8902-4f8f-b870-25a8fdad0aeb e49c1a82-a0f7-45e8-9f46-2f75c43f9fbc Charged Off 24613 Long Term 6640.0 6 years Rent 49225.0 Business Loan $542.29 17.6 73.0 7 0 14123 16954 0.0 0.0
7 0000afa6-8902-4f8f-b870-25a8fdad0aeb e49c1a82-a0f7-45e8-9f46-2f75c43f9fbc Charged Off 24613 Long Term NaN 6 years Rent NaN Business Loan $542.29 17.6 73.0 7 0 14123 16954 0.0 0.0
In [10]:
no_dup['null_count'] = no_dup.isnull().sum(axis=1)
no_dup = no_dup.sort_values('null_count', ascending=True).drop('null_count', axis=1)
In [11]:
no_dup.head(20)
Out[11]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 10+ years Home Mortgage 33694.0 Debt Consolidation $584.03 12.3 41.0 10 0 6760 16056 0.0 0.0
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 10+ years Home Mortgage 90711.0 Debt Consolidation $1,322.87 31.9 11.0 21 0 18290 19752 0.0 0.0
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 years Home Mortgage 110549.0 Debt Consolidation $1,824.06 21.6 13.0 18 0 40907 52513 0.0 0.0
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 years Home Mortgage 34318.0 Debt Consolidation $506.18 16.5 68.0 6 0 5429 10501 0.0 0.0
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 years Rent 37104.0 Debt Consolidation $556.55 13.8 34.0 11 0 732 12621 0.0 0.0
128254 7f95588d-57ba-4e00-988e-6bca2f2d32ec 3faf854b-3530-4922-96d7-92d95ad9ded0 Charged Off 7077 Short Term 7400.0 3 years Home Mortgage 47179.0 Debt Consolidation $1,336.74 12.9 5.0 11 0 16689 18879 0.0 0.0
128253 7f943f01-36ea-4660-81da-82a2a3ae333f f07fc1c0-6ceb-4bf2-8066-99d485d21eee Fully Paid 7267 Short Term 717.0 10+ years Home Mortgage 78032.0 Debt Consolidation $994.91 22.4 55.0 12 1 6919 7775 1.0 0.0
128251 7f943e2c-33e5-444a-bfce-ff83b0d44036 b3173654-f680-4d83-9e18-451acd7cdd2d Charged Off 23522 Long Term 6910.0 10+ years Home Mortgage 70567.0 Debt Consolidation $882.09 13.1 17.0 12 0 6401 21196 0.0 0.0
128244 7f901e9c-0e09-4561-9b4d-949ef88a32f0 20f4d09f-f271-4e31-a23f-77c4f7ac82bf Fully Paid 19554 Short Term 740.0 10+ years Rent 53774.0 Debt Consolidation $1,030.66 29.0 14.0 10 0 26450 59706 0.0 0.0
128242 7f8fea60-ef3a-409b-84b8-f0f9e501293b f3c67c03-d825-451d-a1b1-a3c9ec3757a3 Fully Paid 99999999 Short Term 745.0 6 years Home Mortgage 55258.0 other $915.90 25.0 81.0 9 0 16559 19760 0.0 0.0
128241 7f8fdf3e-54e6-46c6-b9e7-8578d376438b fe9a45e1-d046-454d-8373-8e6597036316 Fully Paid 17707 Short Term 743.0 10+ years Home Mortgage 79434.0 Debt Consolidation $807.58 26.9 36.0 10 0 15111 23501 0.0 0.0
128239 7f8ede6b-7b07-4ef6-b32f-5de56b39ad90 047bcc6d-cf8a-448c-aba9-80fe88ab9e37 Fully Paid 10026 Short Term 731.0 1 year Home Mortgage 65169.0 Debt Consolidation $667.98 20.8 40.0 13 0 17769 28160 0.0 0.0
128266 7f997992-62a9-4c5f-a611-68ad0f655469 c831a6d5-d78f-42ca-8592-db7ce4a0ee62 Fully Paid 99999999 Long Term 694.0 10+ years Own Home 864258.0 Debt Consolidation $5,912.97 18.4 6.0 21 0 63466 93746 0.0 0.0
128234 7f8de056-3a6d-4ace-901b-27fb473e04a4 d4cc2de5-c43f-4941-bedb-5178e26d4b9a Fully Paid 9621 Short Term 740.0 10+ years Rent 64731.0 Debt Consolidation $760.59 23.9 12.0 11 0 5243 11625 0.0 0.0
128228 7f8bd8c2-c195-4425-8e27-a8b0f390b386 0562ab32-f8cd-4334-963b-16d6352168c3 Fully Paid 7806 Short Term 732.0 3 years Home Mortgage 29742.0 Debt Consolidation $401.52 9.5 13.0 12 0 5143 21703 0.0 0.0
128227 7f8b8fbf-37b7-4ee1-a873-7c85dfcd1f2f 492f37b2-91e6-4e81-b9f3-5cce922c758e Fully Paid 11121 Short Term 732.0 2 years Rent 49034.0 Debt Consolidation $976.59 13.3 78.0 9 0 19484 31941 0.0 0.0
128226 7f8ababf-72ab-4119-b1b6-428b336c69df 0b6cf0a4-baa3-4082-a6c8-3f2215e41611 Fully Paid 7429 Short Term 741.0 10+ years Home Mortgage 54478.0 Debt Consolidation $1,457.27 32.5 40.0 21 2 8296 32535 0.0 2.0
128217 7f8a04ad-214a-474f-b6bd-aeafd2acab8c ebcec38c-a064-404b-8bc7-16dc616f5920 Fully Paid 11461 Short Term 750.0 10+ years Own Home 102330.0 Other $1,227.96 38.5 8.0 12 0 61587 249341 0.0 0.0
128215 7f89d5df-1397-431f-8075-f97d7010e932 30d2930b-6d86-4d27-afb1-ae7fcd61d2ec Fully Paid 99999999 Short Term 743.0 5 years Rent 58277.0 Debt Consolidation $845.01 14.4 18.0 10 0 10417 21932 0.0 0.0
128212 7f893d8b-d5b1-42ee-856b-1036b81642ad dc92390e-6486-4c28-929a-3a48ff48037f Fully Paid 9909 Short Term 740.0 4 years Own Home 43600.0 Debt Consolidation $1,046.39 13.8 28.0 11 0 5353 11389 0.0 0.0
In [13]:
no_dup.drop_duplicates(subset = 'Loan ID',keep = 'first',inplace = True)
In [14]:
no_dup[no_dup['Loan ID'].duplicated()==True]
Out[14]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens

Analyze the new dataframe with duplicates removed

In [15]:
no_dup.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 215700 entries, 0 to 25243
Data columns (total 19 columns):
Loan ID                         215700 non-null object
Customer ID                     215700 non-null object
Loan Status                     215700 non-null object
Current Loan Amount             215700 non-null int64
Term                            215700 non-null object
Credit Score                    171202 non-null float64
Years in current job            206710 non-null object
Home Ownership                  215700 non-null object
Annual Income                   171202 non-null float64
Purpose                         215700 non-null object
Monthly Debt                    215700 non-null object
Years of Credit History         215700 non-null float64
Months since last delinquent    97438 non-null float64
Number of Open Accounts         215700 non-null int64
Number of Credit Problems       215700 non-null int64
Current Credit Balance          215700 non-null int64
Maximum Open Credit             215700 non-null object
Bankruptcies                    215248 non-null float64
Tax Liens                       215678 non-null float64
dtypes: float64(6), int64(4), object(9)
memory usage: 32.9+ MB
In [17]:
#looking at the null values in my dataframe
no_dup.isnull().sum()
Out[17]:
Loan ID                              0
Customer ID                          0
Loan Status                          0
Current Loan Amount                  0
Term                                 0
Credit Score                     44498
Years in current job              8990
Home Ownership                       0
Annual Income                    44498
Purpose                              0
Monthly Debt                         0
Years of Credit History              0
Months since last delinquent    118262
Number of Open Accounts              0
Number of Credit Problems            0
Current Credit Balance               0
Maximum Open Credit                  0
Bankruptcies                       452
Tax Liens                           22
dtype: int64
In [18]:
no_dup.describe()
Out[18]:
Current Loan Amount Credit Score Annual Income Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Bankruptcies Tax Liens
count 2.157000e+05 171202.000000 1.712020e+05 215700.000000 97438.000000 215700.000000 215700.000000 2.157000e+05 215248.000000 215678.000000
mean 1.633515e+07 1035.265114 7.280452e+04 18.359758 35.031949 11.077566 0.155424 1.545723e+04 0.110556 0.026331
std 3.695307e+07 1377.767153 5.730483e+04 7.057999 21.810714 4.971825 0.457972 1.975044e+04 0.336853 0.240853
min 5.050000e+02 585.000000 0.000000e+00 3.400000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000
25% 8.328000e+03 715.000000 4.476600e+04 13.600000 17.000000 8.000000 0.000000 5.932000e+03 0.000000 0.000000
50% 1.464200e+04 733.000000 6.210500e+04 17.000000 32.000000 10.000000 0.000000 1.104200e+04 0.000000 0.000000
75% 2.520800e+04 743.000000 8.779200e+04 21.800000 51.000000 14.000000 0.000000 1.932400e+04 0.000000 0.000000
max 1.000000e+08 7510.000000 8.713547e+06 70.500000 176.000000 76.000000 11.000000 1.731412e+06 7.000000 11.000000

Fill in missing values

In [19]:
filledScore = no_dup[no_dup['Credit Score'].notnull()]
plt.hist(filledScore['Credit Score'], bins = 50)
plt.show()
In [20]:
df_cleaned =no_dup.copy()
df_cleaned['Credit Score']= no_dup['Credit Score'].fillna(no_dup['Credit Score'].median())
In [21]:
df_cleaned.head()
Out[21]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 10+ years Home Mortgage 33694.0 Debt Consolidation $584.03 12.3 41.0 10 0 6760 16056 0.0 0.0
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 10+ years Home Mortgage 90711.0 Debt Consolidation $1,322.87 31.9 11.0 21 0 18290 19752 0.0 0.0
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 years Home Mortgage 110549.0 Debt Consolidation $1,824.06 21.6 13.0 18 0 40907 52513 0.0 0.0
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 years Home Mortgage 34318.0 Debt Consolidation $506.18 16.5 68.0 6 0 5429 10501 0.0 0.0
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 years Rent 37104.0 Debt Consolidation $556.55 13.8 34.0 11 0 732 12621 0.0 0.0
In [22]:
df_cleaned['Years in current job'].unique()
Out[22]:
array(['10+ years', '8 years', '2 years', '3 years', '6 years', '1 year',
       '5 years', '4 years', '7 years', '< 1 year', '9 years', nan],
      dtype=object)
In [23]:
df_cleaned['Years in current job'].mode()
Out[23]:
0    10+ years
dtype: object
In [24]:
df_cleaned[df_cleaned['Years in current job'].isnull()==True]
Out[24]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
173280 ac42556a-621c-4efe-accd-81b0f8b88763 a2a872f7-7f3c-42c1-b293-c47d1c8dcaf1 Fully Paid 14413 Short Term 742.0 NaN Home Mortgage 62056.0 Home Improvements $1,054.95 44.3 40.0 11 0 15763 20209 0.0 0.0
173534 ac805fbc-eed9-4358-acfe-0483ab2bdc4f 5e18f2ba-6b6b-4d8d-8d17-5e7505f5f2fc Fully Paid 99999999 Short Term 743.0 NaN Rent 64333.0 Take a Trip $456.23 20.3 35.0 7 0 4361 8954 0.0 0.0
174092 ad06660e-ea3c-4065-9a5d-fb5a48a844af f3f9b937-67ba-4e75-8090-131bc7f8cbb7 Fully Paid 99999999 Short Term 748.0 NaN Home Mortgage 87084.0 Debt Consolidation $957.92 17.0 27.0 17 0 12442 57869 0.0 0.0
26896 1ae0e1f4-60c6-4057-b7f0-ff7d658c9d8d 80b73ac5-4596-4fdd-a8b6-c5fc6a549100 Charged Off 11797 Short Term 746.0 NaN Own Home 49155.0 Debt Consolidation $1,151.04 27.0 61.0 10 0 8812 20930 0.0 0.0
173682 aca2383d-d3cb-46e7-97bd-bc0805105db0 fa638ab0-0f90-40a2-8538-bd6822d4fb13 Fully Paid 99999999 Short Term 721.0 NaN Rent 63104.0 Debt Consolidation $678.36 20.0 21.0 8 1 8763 20145 1.0 0.0
173329 ac4f9423-eb2c-490f-94a5-0c1effa6a797 9475ae94-3944-41b0-a533-d925d58e4f23 Fully Paid 4080 Short Term 676.0 NaN Rent 45900.0 Business Loan $284.58 27.1 61.0 7 1 4969 13359 1.0 0.0
26937 1aed3d3c-2900-4d7c-9d60-9c91c1bb950a 37d905fe-cbc7-4df3-bf38-deff00d68cf6 Fully Paid 2371 Short Term 744.0 NaN Rent 8179.0 Debt Consolidation $55.35 23.5 34.0 4 0 1898 12822 0.0 0.0
173448 ac6c1ced-3d87-4e5a-937a-30087175e003 3ebc9323-fd95-4e37-aed2-1e233dbe1b32 Charged Off 9836 Short Term 7000.0 NaN Home Mortgage 39344.0 Debt Consolidation $439.34 22.0 0.0 10 0 5255 21450 0.0 0.0
173426 ac68d02f-03c0-47bd-9913-4072a6c41bb6 902b24c3-47bc-4a24-adee-cc2b2fc4b0ec Fully Paid 99999999 Short Term 735.0 NaN Home Mortgage 82002.0 Business Loan $1,366.70 25.8 32.0 12 1 14573 24826 1.0 0.0
173472 ac732b14-0923-40ee-adc3-5ebf40547c94 38414da0-02e4-495e-a681-05d85bf2eec0 Fully Paid 15302 Short Term 742.0 NaN Home Mortgage 65286.0 Debt Consolidation $935.77 21.5 47.0 10 0 45585 51802 0.0 0.0
173367 ac584085-6307-4dc2-8ed5-4cf00a6dfd3c 93b132cd-104b-4792-8c2f-a28017e87a3e Fully Paid 99999999 Short Term 731.0 NaN Home Mortgage 28035.0 Debt Consolidation $523.33 20.8 26.0 12 1 3080 10441 0.0 0.0
173545 ac838e64-261b-411f-90b0-163c9fef02ac cf2e394a-863c-4a7c-a51d-a6933db8aa01 Fully Paid 20146 Short Term 739.0 NaN Home Mortgage 71090.0 Debt Consolidation $493.49 47.9 31.0 6 1 3598 11682 1.0 0.0
174068 ad02d96e-b2a2-47ec-8e6f-4db0a77b90f6 0f65afd7-d784-4cd0-886f-0f66bbe7ec06 Fully Paid 5977 Short Term 737.0 NaN Home Mortgage 51404.0 Debt Consolidation $582.58 19.9 16.0 8 0 9561 12257 0.0 0.0
174080 ad04a669-3177-450c-adab-f52050f87228 5d05676e-9178-4141-be2e-686346c8ec1d Fully Paid 15682 Short Term 700.0 NaN Rent 73946.0 Debt Consolidation $1,429.61 28.6 32.0 9 0 9543 10487 0.0 0.0
26952 1aef5ca7-71a1-4c6e-af48-3126f0625120 4b9dfe28-69ed-41d6-9331-7bdbfe3956cc Fully Paid 2928 Short Term 746.0 NaN Home Mortgage 24398.0 Home Improvements $82.95 23.1 31.0 6 1 2208 10368 1.0 0.0
26978 1af65a97-17d8-432f-b8b4-17096aa50511 ad99750e-961d-4e30-8bda-fb2ce53bae3a Fully Paid 99999999 Short Term 724.0 NaN Home Mortgage 51903.0 Debt Consolidation $648.79 14.6 56.0 8 0 1950 3234 0.0 0.0
174036 acfb2377-afe4-49ea-8713-3b778ff88d0a c8ad7a90-20ab-48a4-b727-4bbe02afca54 Charged Off 28003 Long Term 644.0 NaN Rent 67978.0 Other $1,648.47 20.1 1.0 14 0 32953 40089 0.0 0.0
175793 aec1ac05-fa56-46e7-b56a-cf99f54c10ec 703e03dd-4975-44bb-9c5a-c829e66a426e Charged Off 4996 Short Term 714.0 NaN Home Mortgage 21707.0 Debt Consolidation $464.90 13.6 76.0 8 2 7276 11987 0.0 0.0
248985 f7f98052-5983-490f-87ef-9dad239d8218 cbd0ec21-01e5-460f-b4b9-1ca2f9004992 Fully Paid 99999999 Short Term 701.0 NaN Home Mortgage 44222.0 other $420.11 32.1 23.0 10 0 16845 21350 0.0 0.0
175826 aec8a60c-8ce9-4fb1-a010-0e345ea29b82 291d3bc3-fc82-4828-9a35-acbaab037520 Fully Paid 7800 Short Term 720.0 NaN Own Home 68250.0 Debt Consolidation $1,302.43 26.0 18.0 7 0 25542 29631 0.0 0.0
2505 0291642a-3431-42d9-91d2-f840e7edcef0 33192245-25ba-4ca5-b19b-733bd54073b1 Fully Paid 15899 Short Term 733.0 NaN Home Mortgage 70553.0 Debt Consolidation $391.57 25.7 18.0 12 0 14592 20757 0.0 0.0
175933 aee5937a-0d6f-408c-a608-eb0f2ca1fb07 ac49f838-1aff-4066-8b95-66f7aa79a4d1 Fully Paid 5125 Short Term 719.0 NaN Rent 39210.0 Other $229.70 31.8 41.0 9 1 1186 9563 1.0 0.0
248932 f7ec5f42-545d-47f0-9122-ed7de3c98ae5 154a15d1-7467-4f93-af69-b139c470239f Fully Paid 5765 Short Term 744.0 NaN Home Mortgage 51015.0 Debt Consolidation $880.01 39.6 27.0 21 0 44922 123751 0.0 0.0
175919 aee07838-218a-4711-8959-2ed1833027a1 1ee0c5d8-4232-4a50-8e56-be35d208c57f Fully Paid 11942 Short Term 739.0 NaN Home Mortgage 89568.0 other $1,492.80 25.5 18.0 8 0 34856 283381 0.0 0.0
249066 f80d7079-1cff-4670-a829-f2f7a9ac571f e1dfc891-91d4-4d73-9e42-b914eb293189 Fully Paid 12011 Short Term 738.0 NaN Home Mortgage 58921.0 Debt Consolidation $819.99 20.9 18.0 11 0 15807 23076 0.0 0.0
249095 f8141615-1977-4e65-98e3-4b873efe077e ac19299a-9247-4032-80df-6ed7a0684573 Fully Paid 3991 Short Term 734.0 NaN Own Home 14975.0 Buy a Car $49.92 17.0 44.0 4 0 738 23818 0.0 0.0
175635 ae9909fc-3fd1-4cc2-b60d-39b5dc7739de 6c3f8d54-e607-4b8e-906a-217946623698 Charged Off 7838 Short Term 720.0 NaN Home Mortgage 37866.0 Debt Consolidation $798.34 48.5 57.0 9 1 3084 6295 1.0 0.0
26152 1a1a418d-fe50-43c5-a415-52ec77aada88 0e7c0a01-244a-43e1-9d71-d8d1237ac7d2 Fully Paid 99999999 Short Term 715.0 NaN Rent 15321.0 Debt Consolidation $90.90 11.3 25.0 6 0 2719 2859 0.0 0.0
176405 af5a650b-af73-41fa-a9a0-86e480e1aa84 18918911-3c03-4f3f-a679-a3d30880a261 Fully Paid 9846 Short Term 734.0 NaN Own Home 25600.0 Debt Consolidation $509.86 18.1 14.0 7 0 5848 7776 0.0 0.0
26075 1a07527b-d426-4df9-a90d-561a0e77e121 1af45f46-23b3-44c2-918f-d3ea94de0ecd Charged Off 23402 Short Term 7260.0 NaN Own Home 63382.0 Debt Consolidation $1,061.64 45.6 57.0 8 0 29071 31092 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
210653 d199168d-74e1-4a91-8977-4a92abd3b7e6 ff0ec39b-ed73-4621-b3d0-37ca5ebdd781 Fully Paid 4660 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $410.56 24.6 NaN 9 1 4027 6601 0.0 1.0
196939 c3d122fb-82f9-49aa-a53b-fcaa0c5005cc db5157da-7f78-467e-9ad6-ac59a02ba37a Fully Paid 4021 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $714.70 17.6 NaN 10 1 7124 12054 1.0 0.0
73942 49754fb6-c9f7-47e1-82bc-8e36a0601e34 d7568176-b747-45ac-a37c-5f1ac79a3e88 Fully Paid 12289 Long Term 733.0 NaN Rent NaN Debt Consolidation $410.03 8.9 NaN 13 0 4700 13663 0.0 0.0
183394 b65c390f-f654-4fd3-aee3-0d5a90ceaa57 cdadacf2-762c-4d72-9b71-45d8daa3b57f Fully Paid 10664 Short Term 733.0 NaN Rent NaN Debt Consolidation $554.43 16.0 NaN 14 0 17322 27760 0.0 0.0
5598 0598ffd1-01b8-4cf8-a189-1aa921e04cc9 5b07ab32-4213-4641-8848-8b8909e978f9 Fully Paid 4052 Short Term 733.0 NaN Rent NaN Debt Consolidation $196.00 11.0 NaN 6 0 5491 7595 0.0 0.0
121922 792b4481-31b8-4ab8-82d3-0a884d97b3ef a671bd7f-633b-4dd5-8e8d-e6857f71f26f Fully Paid 28398 Long Term 733.0 NaN Rent NaN Debt Consolidation $1,899.93 42.4 NaN 16 0 28759 43182 0.0 0.0
188655 bba2d454-4163-4a97-a951-ee7a1c6b0671 40b9912e-5fff-4da0-b09a-6a7cd7fe4b64 Fully Paid 9996 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $823.00 24.4 NaN 16 0 8674 37386 0.0 0.0
5602 059a48b2-92a0-4419-92b9-58e0c76a73d7 034ddee5-6b1f-4967-a482-3992d25180d0 Charged Off 19548 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $626.76 29.0 NaN 11 0 21502 36819 0.0 0.0
134137 85670bd6-c686-48ca-b2f6-0d54d0eafb3b 2af61bd7-d7dc-4c1d-846f-f8e9be45c420 Fully Paid 1803 Short Term 733.0 NaN Rent NaN Debt Consolidation $295.44 19.7 NaN 11 0 14797 26236 0.0 0.0
225482 e04e59f3-45da-41b8-9103-33d9f78287ee 08a02c9a-4dc1-4c1f-91fa-e2782132551c Charged Off 5042 Short Term 733.0 NaN Home Mortgage NaN other $1,329.27 13.4 NaN 13 0 21420 21924 0.0 0.0
192595 bf86c30a-6bc9-4c3e-92e4-d0f28ddf3c3f 5e6e1943-cb63-4e56-8850-7b45945b3574 Fully Paid 15428 Short Term 733.0 NaN Rent NaN Debt Consolidation $659.17 22.1 NaN 7 0 9843 11236 0.0 0.0
83235 52a7f035-1628-4d2f-acfb-0e18102900c5 6fcd2b59-3263-4978-9f24-541ae5cd1d85 Fully Paid 5001 Short Term 733.0 NaN Rent NaN Debt Consolidation $458.09 9.4 NaN 9 1 4573 7597 1.0 0.0
161673 a0d13d77-5adc-4d4b-b5c4-54e027e53b9f e09b780c-295d-4770-b62f-b1b9b0d45b12 Fully Paid 19904 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $442.50 38.2 NaN 9 0 22304 58085 0.0 0.0
112872 7017e2bf-addf-4921-8a5c-83033aa4ba5f 55f04f74-db14-45a7-a93b-b650d887f3a6 Fully Paid 14792 Short Term 733.0 NaN Rent NaN Debt Consolidation $1,756.26 18.0 NaN 8 0 33082 53617 0.0 0.0
62660 3e5bb348-a284-4677-9ca0-41d2d11d2296 7adca112-4d41-446f-9a32-3d9b62d86c31 Fully Paid 5996 Short Term 733.0 NaN Own Home NaN Buy House $25.05 16.9 NaN 4 0 324 19048 0.0 0.0
237038 ebda9fc3-2adc-4686-ab07-0b07a62d2f37 20f60f2f-8185-4948-9bc6-cff6d694cf45 Fully Paid 3334 Short Term 733.0 NaN Rent NaN Debt Consolidation $357.22 29.0 NaN 11 0 3551 72460 0.0 0.0
37017 24de2115-2b17-4f7c-a83a-de4e01719087 65bdda98-3efd-4c62-ade5-17413dfd00ab Fully Paid 5090 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $267.16 10.3 NaN 4 0 4582 6005 0.0 0.0
37000 24dac85e-c66d-47ff-8114-babaee373c09 2d2f7df5-602b-4db4-aebc-5ab579232e6d Fully Paid 5928 Short Term 733.0 NaN Rent NaN Debt Consolidation $101.74 19.9 NaN 5 1 3439 10421 1.0 0.0
183515 b67a2be4-b538-417c-9ff0-52dae0f57255 effb3d59-f5b1-4249-968f-a507bc1547b0 Fully Paid 6837 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $935.33 17.7 NaN 9 0 16108 25287 0.0 0.0
247357 f65e7d75-a80c-4e1f-b0e0-49e55726b2ca 550a58d6-5961-44e5-94a7-09c0e296a4b6 Fully Paid 2020 Short Term 733.0 NaN Home Mortgage NaN Home Improvements $168.70 24.0 NaN 4 0 6497 6767 0.0 0.0
80610 5010f786-7b3d-4aac-bc53-01d1bf673abd 1f410f88-3edd-4036-9038-628260ec9eab Fully Paid 11598 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $420.63 39.3 NaN 9 0 3865 9884 0.0 0.0
112851 7011f892-0c94-4249-bf76-1d037cea6439 7a92e24d-10f3-496b-8dd9-b9f6bde65d36 Fully Paid 3058 Short Term 733.0 NaN HaveMortgage NaN Take a Trip $1,166.91 27.4 NaN 8 0 35654 61791 0.0 0.0
99591 62f59cb7-d5e2-48e0-a641-a34f12966770 113b1587-82c1-4460-87fa-1948108ce818 Fully Paid 11046 Short Term 733.0 NaN Rent NaN Debt Consolidation $508.99 12.7 NaN 10 0 16524 18319 0.0 0.0
35740 239eddda-a017-4052-a54f-ddf602c53055 9190a4e4-edcc-4529-99f3-e06b8e60c42f Fully Paid 7878 Short Term 733.0 NaN Own Home NaN Debt Consolidation $307.08 41.7 NaN 12 0 9432 82022 0.0 0.0
222347 dd43d4cf-f75b-4496-ae50-5db390b11c14 e5d9d904-6bb2-4475-8e4a-e7538955793e Charged Off 18011 Short Term 733.0 NaN Home Mortgage NaN Debt Consolidation $790.45 20.0 NaN 7 0 27619 34567 0.0 0.0
136736 880944f8-112e-477b-aba4-607adf7354ae b1b314ce-f59d-4a02-917c-af1dd1d3c0f8 Fully Paid 10470 Long Term 733.0 NaN Rent NaN Debt Consolidation $254.19 9.9 NaN 4 0 8419 8617 0.0 0.0
210603 d18e1dcf-b1b0-48ea-b618-0f8e68b78d95 e02cd691-2e14-4160-8f48-60722ef13e56 Fully Paid 7997 Short Term 733.0 NaN Rent NaN Debt Consolidation $487.97 9.1 NaN 15 0 10573 25114 0.0 0.0
201198 c827d04f-37c2-4ea5-8ea7-07941ee84c56 4d5379f1-4e8f-42fa-89cb-66a5a721f4dd Fully Paid 11860 Short Term 733.0 NaN Rent NaN Debt Consolidation $193.69 16.2 NaN 5 0 8339 17089 0.0 0.0
128143 7f78cf95-560e-4c5a-9363-badced6777fa 76dcd93e-3fa8-4af3-a975-c32eca83059e Fully Paid 2018 Short Term 733.0 NaN Home Mortgage NaN Medical Bills $800.68 14.5 NaN 17 0 21697 41249 0.0 0.0
130205 818f9bb4-15a4-4c64-a18f-eb403a20bb74 f3bf0af3-2746-4419-bed1-dbc488eef769 Fully Paid 1976 Short Term 733.0 NaN Rent NaN Debt Consolidation $166.00 9.6 NaN 4 0 5175 10780 0.0 0.0

8990 rows × 19 columns

In [25]:
df_cleaned['Null years in current job'] = df_cleaned['Years in current job'].isnull()
In [26]:
df_cleaned.head()
Out[26]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 10+ years Home Mortgage 33694.0 Debt Consolidation $584.03 12.3 41.0 10 0 6760 16056 0.0 0.0 False
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 10+ years Home Mortgage 90711.0 Debt Consolidation $1,322.87 31.9 11.0 21 0 18290 19752 0.0 0.0 False
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 years Home Mortgage 110549.0 Debt Consolidation $1,824.06 21.6 13.0 18 0 40907 52513 0.0 0.0 False
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 years Home Mortgage 34318.0 Debt Consolidation $506.18 16.5 68.0 6 0 5429 10501 0.0 0.0 False
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 years Rent 37104.0 Debt Consolidation $556.55 13.8 34.0 11 0 732 12621 0.0 0.0 False
In [27]:
df_cleaned['Years in current job'] = df_cleaned['Years in current job'].fillna('0')
In [28]:
df_cleaned[df_cleaned['Years in current job'].isnull()==True]
Out[28]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job
In [29]:
df_cleaned.iloc[9]
Out[29]:
Loan ID                         7f8fea60-ef3a-409b-84b8-f0f9e501293b
Customer ID                     f3c67c03-d825-451d-a1b1-a3c9ec3757a3
Loan Status                                               Fully Paid
Current Loan Amount                                         99999999
Term                                                      Short Term
Credit Score                                                     745
Years in current job                                         6 years
Home Ownership                                         Home Mortgage
Annual Income                                                  55258
Purpose                                                        other
Monthly Debt                                                 $915.90
Years of Credit History                                           25
Months since last delinquent                                      81
Number of Open Accounts                                            9
Number of Credit Problems                                          0
Current Credit Balance                                         16559
Maximum Open Credit                                            19760
Bankruptcies                                                       0
Tax Liens                                                          0
Null years in current job                                      False
Name: 128242, dtype: object
In [30]:
df_cleaned['Years in current job'][df_cleaned['Years in current job']=='< 1 year']='0'
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
In [31]:
df_cleaned['Years in current job'].unique()
Out[31]:
array(['10+ years', '8 years', '2 years', '3 years', '6 years', '1 year',
       '5 years', '4 years', '7 years', '0', '9 years'], dtype=object)
In [32]:
df_cleaned['Over ten years'] = df_cleaned['Years in current job']==('10+ years')
In [33]:
df_cleaned.head()
Out[33]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose ... Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job Over ten years
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 10+ years Home Mortgage 33694.0 Debt Consolidation ... 12.3 41.0 10 0 6760 16056 0.0 0.0 False True
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 10+ years Home Mortgage 90711.0 Debt Consolidation ... 31.9 11.0 21 0 18290 19752 0.0 0.0 False True
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 years Home Mortgage 110549.0 Debt Consolidation ... 21.6 13.0 18 0 40907 52513 0.0 0.0 False False
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 years Home Mortgage 34318.0 Debt Consolidation ... 16.5 68.0 6 0 5429 10501 0.0 0.0 False False
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 years Rent 37104.0 Debt Consolidation ... 13.8 34.0 11 0 732 12621 0.0 0.0 False False

5 rows × 21 columns

In [34]:
df_cleaned['Years in current job'][df_cleaned['Years in current job']=='10+ years']='0'
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
In [35]:
df_cleaned['Years in current job'].unique()
Out[35]:
array(['0', '8 years', '2 years', '3 years', '6 years', '1 year',
       '5 years', '4 years', '7 years', '9 years'], dtype=object)
In [36]:
df_cleaned['Years in current job'] = df_cleaned['Years in current job'].apply(lambda x:x[0])
In [37]:
df_cleaned.head()
Out[37]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose ... Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job Over ten years
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 0 Home Mortgage 33694.0 Debt Consolidation ... 12.3 41.0 10 0 6760 16056 0.0 0.0 False True
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 0 Home Mortgage 90711.0 Debt Consolidation ... 31.9 11.0 21 0 18290 19752 0.0 0.0 False True
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 Home Mortgage 110549.0 Debt Consolidation ... 21.6 13.0 18 0 40907 52513 0.0 0.0 False False
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 Home Mortgage 34318.0 Debt Consolidation ... 16.5 68.0 6 0 5429 10501 0.0 0.0 False False
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 Rent 37104.0 Debt Consolidation ... 13.8 34.0 11 0 732 12621 0.0 0.0 False False

5 rows × 21 columns

In [38]:
df_cleaned['Years in current job']=df_cleaned['Years in current job'].astype(int)
In [39]:
df_cleaned.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 215700 entries, 0 to 25243
Data columns (total 21 columns):
Loan ID                         215700 non-null object
Customer ID                     215700 non-null object
Loan Status                     215700 non-null object
Current Loan Amount             215700 non-null int64
Term                            215700 non-null object
Credit Score                    215700 non-null float64
Years in current job            215700 non-null int32
Home Ownership                  215700 non-null object
Annual Income                   171202 non-null float64
Purpose                         215700 non-null object
Monthly Debt                    215700 non-null object
Years of Credit History         215700 non-null float64
Months since last delinquent    97438 non-null float64
Number of Open Accounts         215700 non-null int64
Number of Credit Problems       215700 non-null int64
Current Credit Balance          215700 non-null int64
Maximum Open Credit             215700 non-null object
Bankruptcies                    215248 non-null float64
Tax Liens                       215678 non-null float64
Null years in current job       215700 non-null bool
Over ten years                  215700 non-null bool
dtypes: bool(2), float64(6), int32(1), int64(4), object(8)
memory usage: 32.5+ MB
In [40]:
filledIncome = df_cleaned[df_cleaned['Annual Income'].notnull()]
plt.hist(filledIncome['Annual Income'], bins = 10)
plt.show()
In [41]:
df_cleaned['Annual Income'].median()
Out[41]:
62105.0
In [42]:
df_cleaned['Annual Income']= df_cleaned['Annual Income'].fillna(df_cleaned['Annual Income'].median())
In [43]:
df_cleaned['Annual Income'].isnull().sum()
Out[43]:
0
In [44]:
df_cleaned['Months since last delinquent'].unique()
Out[44]:
array([ 41.,  11.,  13.,  68.,  34.,   5.,  55.,  17.,  14.,  81.,  36.,
        40.,   6.,  12.,  78.,   8.,  18.,  28.,  35.,  16.,  49.,  24.,
        53.,  15.,  63.,  70.,  52.,  43.,  22.,  30.,  29.,  74.,  10.,
         4.,  48.,  67.,   7.,  73.,   3.,  42.,  21.,  31.,  79.,  44.,
        54.,  27.,  61.,  65.,  25.,  19.,  80.,  57.,  51.,  32.,  75.,
        62.,   2.,  23.,  64.,  37.,  26.,  66.,  59.,  82.,  46.,   9.,
        20.,  76.,  58.,  47.,  60.,  56.,  45.,  71.,  77., 115.,  50.,
        39.,  69.,  33.,  72.,  38.,   1.,   0.,  85.,  89.,  96.,  83.,
       140.,  91., 114., 143.,  94.,  95.,  86.,  88.,  84.,  98., 135.,
       100.,  97.,  93.,  99., 130., 106.,  87., 122.,  92., 101., 152.,
       139., 148., 109., 134., 113., 131., 119., 133., 110., 120.,  90.,
       102., 151., 176., 149., 116., 104., 107.,  nan, 118., 112., 108.])
In [45]:
filledMon = df_cleaned[df_cleaned['Months since last delinquent'].notnull()]
plt.hist(filledMon['Months since last delinquent'], bins = 100)
plt.show()
In [46]:
df_cleaned['Months since last delinquent'].median()
Out[46]:
32.0
In [47]:
df_cleaned['Months since last delinquent']=df_cleaned['Months since last delinquent'].fillna(32.0)
#seperate column for null
In [48]:
df_cleaned['Bankruptcies'].unique()
Out[48]:
array([ 0.,  1.,  3.,  2.,  4.,  5.,  7.,  6., nan])
In [49]:
filledBank = df_cleaned[df_cleaned['Bankruptcies'].notnull()]
plt.hist(filledBank['Bankruptcies'], bins = 10)
plt.show()
In [50]:
df_cleaned['Bankruptcies'].median()
Out[50]:
0.0
In [51]:
df_cleaned['Bankruptcies'] = df_cleaned['Bankruptcies'].fillna(0.0)
In [52]:
df_cleaned.head()
Out[52]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose ... Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job Over ten years
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 0 Home Mortgage 33694.0 Debt Consolidation ... 12.3 41.0 10 0 6760 16056 0.0 0.0 False True
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 0 Home Mortgage 90711.0 Debt Consolidation ... 31.9 11.0 21 0 18290 19752 0.0 0.0 False True
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 Home Mortgage 110549.0 Debt Consolidation ... 21.6 13.0 18 0 40907 52513 0.0 0.0 False False
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 Home Mortgage 34318.0 Debt Consolidation ... 16.5 68.0 6 0 5429 10501 0.0 0.0 False False
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 Rent 37104.0 Debt Consolidation ... 13.8 34.0 11 0 732 12621 0.0 0.0 False False

5 rows × 21 columns

In [53]:
df_cleaned['Bankruptcies'].isnull().sum()
Out[53]:
0
In [54]:
df_cleaned['Tax Liens'].unique()
Out[54]:
array([ 0.,  2.,  1.,  3.,  5.,  4.,  9.,  6.,  7.,  8., 10., 11., nan])
In [55]:
filledLiens = df_cleaned[df_cleaned['Tax Liens'].notnull()]
plt.hist(filledLiens['Tax Liens'], bins = 10)
plt.show()
In [56]:
df_cleaned['Tax Liens'] = df_cleaned['Tax Liens'].fillna(df_cleaned['Tax Liens'].median())
In [57]:
df_cleaned.head()
Out[57]:
Loan ID Customer ID Loan Status Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose ... Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job Over ten years
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 Fully Paid 11520 Short Term 741.0 0 Home Mortgage 33694.0 Debt Consolidation ... 12.3 41.0 10 0 6760 16056 0.0 0.0 False True
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 Fully Paid 7055 Short Term 742.0 0 Home Mortgage 90711.0 Debt Consolidation ... 31.9 11.0 21 0 18290 19752 0.0 0.0 False True
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a Fully Paid 99999999 Long Term 694.0 8 Home Mortgage 110549.0 Debt Consolidation ... 21.6 13.0 18 0 40907 52513 0.0 0.0 False False
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 Fully Paid 2745 Short Term 743.0 2 Home Mortgage 34318.0 Debt Consolidation ... 16.5 68.0 6 0 5429 10501 0.0 0.0 False False
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 Fully Paid 3008 Short Term 750.0 2 Rent 37104.0 Debt Consolidation ... 13.8 34.0 11 0 732 12621 0.0 0.0 False False

5 rows × 21 columns

Double check to make sure that there are no more missing values

In [58]:
df_cleaned.isnull().sum()
Out[58]:
Loan ID                         0
Customer ID                     0
Loan Status                     0
Current Loan Amount             0
Term                            0
Credit Score                    0
Years in current job            0
Home Ownership                  0
Annual Income                   0
Purpose                         0
Monthly Debt                    0
Years of Credit History         0
Months since last delinquent    0
Number of Open Accounts         0
Number of Credit Problems       0
Current Credit Balance          0
Maximum Open Credit             0
Bankruptcies                    0
Tax Liens                       0
Null years in current job       0
Over ten years                  0
dtype: int64

Analye the unique values of each column to check for spelling mistakes and other mistakes that might affect the model

In [59]:
for item in df_cleaned.columns:
    print(item + ' '+ df[item].dtype.kind)
    print(df_cleaned[item].unique())
Loan ID O
['000025bb-5694-4cff-b17d-192b1a98ba44'
 '7f99634c-cc18-4667-92b4-b9e887b7c296'
 '7f99378c-b36e-4711-9af2-33c7239c00e7' ...
 '7f78cf95-560e-4c5a-9363-badced6777fa'
 '818f9bb4-15a4-4c64-a18f-eb403a20bb74'
 '193d7f29-89f9-4c59-9e44-c1ca8ea36209']
Customer ID O
['5ebc8bb1-5eb9-4404-b11b-a6eebc401a19'
 'e52aac79-e3f0-44cf-b538-539c0a01c849'
 '32bdbd23-ca78-4c14-a0da-afc2a4e7a33a' ...
 '76dcd93e-3fa8-4af3-a975-c32eca83059e'
 'f3bf0af3-2746-4419-bed1-dbc488eef769'
 '3b462912-a081-4407-a1b1-3553a29bb974']
Loan Status O
['Fully Paid' 'Charged Off']
Current Loan Amount i
[   11520     7055 99999999 ...    20512     1089     2216]
Term O
['Short Term' 'Long Term']
Credit Score f
[ 741.  742.  694.  743.  750. 7400.  717. 6910.  740.  745.  731.  732.
  687.  689.  716.  736.  724.  690.  734.  680.  723.  726. 7410.  692.
  730.  739.  703. 7500.  735.  662.  747.  722.  713.  738.  659.  711.
  658.  702.  696.  718.  681.  709.  720.  728.  729.  695.  733.  708.
  748.  697.  626.  701.  725.  700.  649.  698.  714.  712.  749.  746.
  707.  634.  710.  627.  744. 6890.  668.  652.  706.  640.  751.  737.
  647.  721.  715. 7270.  633.  641. 7260.  705.  686.  727.  671.  675.
  663.  699.  693.  679.  664. 6700.  644.  666.  676. 7080.  669.  684.
  665. 7150.  645.  691.  683.  719.  660. 7430. 6960.  667. 7240.  639.
  704.  597. 7300. 7210. 6800.  638.  688. 7420.  651.  656.  620. 6990.
 7330.  643. 7450.  677.  590.  661. 7360.  630. 6920. 7130.  685.  608.
  674.  648.  673. 7040.  617.  625. 7010. 7470. 6660. 7370.  678.  618.
  637. 7280. 6650.  607.  655.  602.  682.  654.  599. 6300.  603. 6740.
 6980. 7250. 7390.  610. 7380. 7140.  670. 6480.  646. 5960.  672. 6860.
 7440.  636. 7290.  600.  657.  624. 7070. 7350. 7310. 6940. 6540. 6830.
 6080. 6470. 6770. 7020. 7320.  635.  585.  596.  621.  611. 7090. 6580.
  605.  632. 6350. 6930. 7000.  586. 7100. 6790. 6750. 7170. 6870. 6780.
 7480. 5920. 6730. 6710.  653. 7340. 6490.  629. 7120. 7220.  650. 7180.
  615. 7200.  642.  619. 6970. 6950. 6070. 7460.  623.  622.  614.  631.
  595. 6760. 6430.  616. 7060. 7190. 7110. 7050. 6280.  612.  589. 6570.
 7230.  592. 6250. 6810. 6230. 6560. 7160. 6610.  628. 6690. 6320. 6220.
 6240.  606. 6270. 6820. 6850. 6510.  588. 6340.  613.  604. 6460. 6550.
  601. 6590. 7490. 6880. 6840. 5970. 6160.  587. 7510. 5910. 6630. 6020.
 6900. 6400.  594. 6190.  609. 6410. 6420. 6720.  591. 6450. 6680. 6370.
 6600. 6010. 6380.  598. 6360. 6090. 7030. 6670. 6390. 5990. 6030. 6290.
 6330. 6180. 6110. 6200. 6500. 6620. 5930.  593. 6440. 6150. 6520. 6100.
 6060. 6530. 5980. 6170. 6260. 6210. 5950. 6050. 6310. 6640. 5870. 5890.
 6130. 6140. 6000. 5850. 5940. 5900. 5860. 6040. 6120. 5880.]
Years in current job O
[0 8 2 3 6 1 5 4 7 9]
Home Ownership O
['Home Mortgage' 'Rent' 'Own Home' 'HaveMortgage']
Annual Income f
[ 33694.  90711. 110549. ...  42653.  45661. 144275.]
Purpose O
['Debt Consolidation' 'other' 'Other' 'Take a Trip' 'Business Loan'
 'Home Improvements' 'Medical Bills' 'Buy House' 'Buy a Car'
 'Educational Expenses']
Monthly Debt O
['$584.03' '$1,322.87 ' '$1,824.06 ' ... '$1,166.91 ' '$254.19' '$166.00']
Years of Credit History f
[12.3 31.9 21.6 16.5 13.8 12.9 22.4 13.1 29.  25.  26.9 20.8 18.4 23.9
  9.5 13.3 32.5 38.5 14.4 19.8 15.  15.7 23.3 14.5 13.7 19.9 19.2 23.8
 22.3 11.3 12.  12.5 21.5 11.9 20.1 22.8 22.5 28.6 18.2 15.4 17.2 14.7
 43.3 31.1 18.1 18.8 15.1 16.9 12.2 26.6 40.7 17.  28.  13.9 12.6 21.
 13.  17.1  7.5 16.  20.2 12.4 13.2 13.5 21.8 17.7 14.9 32.4 14.  19.
 29.1 20.3 22.1 17.5 22.6 11.6  9.4 10.4 24.2 14.6 16.7 28.7 25.7 16.8
 23.  15.6 26.2 23.5 20.5 22.  25.9 34.4 12.8 14.8 26.   8.  21.3 10.3
 20.   9.  19.1 16.6 19.4 15.5 17.9 42.8 30.  11.2 17.4 20.6 21.4 21.9
 10.8 24.  27.  14.1 19.6 19.5 18.  29.8 31.8 15.2 19.3 21.2 11.  17.3
 20.7 26.8 25.3 23.2 26.3 14.3 30.1 15.3 33.6 16.2 11.5 18.7 18.6 25.4
 17.8 23.7 24.5 15.9 30.3 18.9 26.4 27.5 19.7 12.1 10.6 21.1 18.5 34.9
 29.2 20.4 31.  16.3 23.4 32.7 21.7 29.6 32.2 10.5  9.8 30.4 22.2 30.8
 27.2 27.6 31.4 18.3  8.9 10.  26.7 14.2 22.9 24.9 29.9 36.4 23.6 32.
 27.4 17.6 11.7 22.7 16.4 28.4 23.1 25.2  6.7 30.9  8.3 29.5  9.2 16.1
 42.5  9.3  8.2 37.  24.7 28.5 33.9 44.5 11.4 27.3 39.1 33.8 12.7 34.6
 33.5 28.8 10.7 11.8 35.  13.4 27.1 10.9 28.3 24.1 35.5 26.1 36.5 24.4
 30.6 39.  41.9 30.2 25.8 33.  25.5 28.9  7.2 35.4 29.3 31.7 24.8 29.7
  6.9 31.6 10.2  9.1 35.6  7.7 38.3 20.9 43.9  5.4  7.3 24.6 42.  53.5
  8.5 13.6 34.5 33.4 15.8 38.4 33.1 35.9 37.2 37.7  5.  11.1  6.  26.5
 31.5 30.5 38.2  9.6  7.6 42.1 48.2  8.4 35.1 31.3  9.9 24.3  5.8 32.8
 25.1 25.6 28.1 27.7 32.9 36.  38.7 34.3 39.2  8.7 37.3  9.7 38.1 28.2
 36.3 32.1 32.3 44.6 27.9 70.5 10.1 29.4 37.1 34.1 37.6  5.7 33.7 39.5
 41.6 54.2 30.7  8.1 41.3 31.2 46.8 45.1 40.1 39.4 32.6 33.2 27.8 50.9
 34.2  6.3 40.  36.6 49.2 34.7 37.5 47.5 34.  38.6 37.8 36.2  8.6 41.
 33.3  8.8 41.1  5.5 38.   7.1  6.2 39.3 51.7 35.8 46.4  7.4 35.2 41.8
 40.6 35.3  7.  42.3 36.1 40.5 44.4  7.9 34.8 43.4 45.2 36.8 40.2 37.9
 44.9  6.8 36.7  6.4 38.8 48.7 39.6 50.2  5.9 46.3 47.6 41.5  4.6 41.4
 40.4 49.  41.7  4.7 39.8 37.4 39.9 44.1 43.7 41.2 45.5 47.3  4.5 48.
 44.8 48.9 49.1 45.9 36.9 38.9  6.6 43.1 47.4  5.1  6.1 53.4 39.7 42.4
 50.8  4.8 50.7 40.8 46.1 43.6 47.7 44.2 42.6 46.5 42.7 46.7 49.9 43.2
  5.6 50.  42.2 44.7 55.4  5.3 49.5 47.9  6.5 48.3  7.8 40.9 45.  46.2
 48.1 51.  40.3 46.6 43.  47.1 43.8 43.5 51.8 48.5 45.3 44.3 46.  44.
 35.7 51.4 49.3 58.8 47.  50.3  4.9 50.5  4.2 55.8 48.8 52.5 61.7 42.9
 55.7  5.2 52.2 48.4 53.1 46.9 51.1 53.6  4.1 45.8 47.2 49.7 59.1 51.9
 54.1 49.4 56.8 54.  52.6 55.5 55.6 51.6 45.6 50.1  4.4 48.6 49.8 52.3
 60.5 50.6 45.4 45.7 49.6 54.5 53.9  4.3 53.   4.  51.3 52.9  3.9 60.7
 51.5  3.8 57.8  3.7 57.6 54.6 59.7 56.9 66.   3.4 50.4 56.1 57.5 59.9
 54.3 57.7 55.  47.8 61.5 61.3 57.1 56.  53.8 62.5 52.4 53.7 65.8 54.7
 55.3 64.6 65.  58.  52.1 56.5 52.8 61.8 57. ]
Months since last delinquent f
[ 41.  11.  13.  68.  34.   5.  55.  17.  14.  81.  36.  40.   6.  12.
  78.   8.  18.  28.  35.  16.  49.  24.  53.  15.  63.  70.  52.  43.
  22.  30.  29.  74.  10.   4.  48.  67.   7.  73.   3.  42.  21.  31.
  79.  44.  54.  27.  61.  65.  25.  19.  80.  57.  51.  32.  75.  62.
   2.  23.  64.  37.  26.  66.  59.  82.  46.   9.  20.  76.  58.  47.
  60.  56.  45.  71.  77. 115.  50.  39.  69.  33.  72.  38.   1.   0.
  85.  89.  96.  83. 140.  91. 114. 143.  94.  95.  86.  88.  84.  98.
 135. 100.  97.  93.  99. 130. 106.  87. 122.  92. 101. 152. 139. 148.
 109. 134. 113. 131. 119. 133. 110. 120.  90. 102. 151. 176. 149. 116.
 104. 107. 118. 112. 108.]
Number of Open Accounts i
[10 21 18  6 11 12  9 13 20  8  4 14  5  7 15 16 32 17 33 22 19  3 26  2
 24 36 28 37 25 31 34 23 27 29 30 55 35 38 39 40 42 41  0 44 43 46 45  1
 51 50 47 53 48 56 54 52 76 49 58]
Number of Credit Problems i
[ 0  1  2  3  5  4  9  8  6  7 10 11]
Current Credit Balance i
[ 6760 18290 40907 ... 33838 34838 40652]
Maximum Open Credit O
[16056 19752 52513 ... '25287' 61791 82022]
Bankruptcies f
[0. 1. 3. 2. 4. 5. 7. 6.]
Tax Liens f
[ 0.  2.  1.  3.  5.  4.  9.  6.  7.  8. 10. 11.]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Null years in current job'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-59-0a4322840b28> in <module>()
      1 for item in df_cleaned.columns:
----> 2     print(item + ' '+ df[item].dtype.kind)
      3     print(df_cleaned[item].unique())

C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2683             return self._getitem_multilevel(key)
   2684         else:
-> 2685             return self._getitem_column(key)
   2686 
   2687     def _getitem_column(self, key):

C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2690         # get column
   2691         if self.columns.is_unique:
-> 2692             return self._get_item_cache(key)
   2693 
   2694         # duplicate columns & possible reduce dimensionality

C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3063                 return self._engine.get_loc(key)
   3064             except KeyError:
-> 3065                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3066 
   3067         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Null years in current job'

Fix spelling errors in the dataset

In [61]:
df_cleaned['Home Ownership'][df_cleaned['Home Ownership']=='HaveMortgage']='Home Mortgage'
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
In [62]:
df_cleaned['Purpose'][df_cleaned['Purpose']=='other']='Other'
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
In [63]:
df_cleaned['Purpose'].unique()
Out[63]:
array(['Debt Consolidation', 'Other', 'Take a Trip', 'Business Loan',
       'Home Improvements', 'Medical Bills', 'Buy House', 'Buy a Car',
       'Educational Expenses'], dtype=object)

Convert the "Monthly Debt" column into numerical form

In [64]:
df_cleaned['Monthly Debt'].head()
Out[64]:
0            $584.03
128265    $1,322.87 
128263    $1,824.06 
128259       $506.18
128256       $556.55
Name: Monthly Debt, dtype: object
In [65]:
df_cleaned['Monthly Debt'] = df_cleaned['Monthly Debt'].apply(lambda x:x[x.find('$')+1:])
In [66]:
df_cleaned['Monthly Debt']=df_cleaned['Monthly Debt'].str.replace(',','')
In [67]:
df_cleaned['Monthly Debt']=df_cleaned[['Monthly Debt']].astype(float)

Check the data types of each column

In [68]:
df_cleaned.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 215700 entries, 0 to 25243
Data columns (total 21 columns):
Loan ID                         215700 non-null object
Customer ID                     215700 non-null object
Loan Status                     215700 non-null object
Current Loan Amount             215700 non-null int64
Term                            215700 non-null object
Credit Score                    215700 non-null float64
Years in current job            215700 non-null int32
Home Ownership                  215700 non-null object
Annual Income                   215700 non-null float64
Purpose                         215700 non-null object
Monthly Debt                    215700 non-null float64
Years of Credit History         215700 non-null float64
Months since last delinquent    215700 non-null float64
Number of Open Accounts         215700 non-null int64
Number of Credit Problems       215700 non-null int64
Current Credit Balance          215700 non-null int64
Maximum Open Credit             215700 non-null object
Bankruptcies                    215700 non-null float64
Tax Liens                       215700 non-null float64
Null years in current job       215700 non-null bool
Over ten years                  215700 non-null bool
dtypes: bool(2), float64(7), int32(1), int64(4), object(7)
memory usage: 32.5+ MB

Look for the values in "Maximum Open Credit" that cannot be converted to floats

In [69]:
df_cleaned['Maximum Open Credit'][df_cleaned['Maximum Open Credit']=='#VALUE!']
Out[69]:
184663    #VALUE!
187513    #VALUE!
Name: Maximum Open Credit, dtype: object

Drop the two rows with "#VALUE!" in its "Maximum Open Credit" column

In [70]:
df_cleaned.drop(184663,axis = 0, inplace = True)
In [71]:
df_cleaned.iloc[184663]
Out[71]:
Loan ID                         191696df-dfb3-4059-9bbe-bb803e08af16
Customer ID                     c736d42a-a2a5-419d-824b-f4a1ceddd2b1
Loan Status                                               Fully Paid
Current Loan Amount                                            18484
Term                                                      Short Term
Credit Score                                                     720
Years in current job                                               0
Home Ownership                                         Home Mortgage
Annual Income                                                  46149
Purpose                                           Debt Consolidation
Monthly Debt                                                 1126.81
Years of Credit History                                         21.5
Months since last delinquent                                      32
Number of Open Accounts                                           12
Number of Credit Problems                                          0
Current Credit Balance                                         21871
Maximum Open Credit                                            23168
Bankruptcies                                                       0
Tax Liens                                                          0
Null years in current job                                       True
Over ten years                                                 False
Name: 25090, dtype: object
In [72]:
df_cleaned['Maximum Open Credit'][df_cleaned['Maximum Open Credit']=='#VALUE!']
Out[72]:
187513    #VALUE!
Name: Maximum Open Credit, dtype: object
In [73]:
df_cleaned.drop(187513,axis = 0, inplace = True)
In [74]:
df_cleaned['Maximum Open Credit']=df_cleaned['Maximum Open Credit'].astype(int)
In [75]:
#check for giant numbers
df_cleaned.describe()
Out[75]:
Current Loan Amount Credit Score Years in current job Annual Income Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens
count 2.156980e+05 215698.000000 215698.000000 2.156980e+05 215698.000000 215698.000000 215698.000000 215698.000000 215698.000000 2.156980e+05 2.156980e+05 215698.000000 215698.000000
mean 1.633530e+07 972.911524 2.536903 7.059750e+04 960.396162 18.359797 33.369660 11.077627 0.155426 1.545737e+04 3.698387e+04 0.110326 0.026328
std 3.695320e+07 1233.537371 2.883904 5.123631e+04 634.781931 7.058020 14.736628 4.971806 0.457974 1.975048e+04 6.011805e+05 0.336539 0.240842
min 5.050000e+02 585.000000 0.000000 0.000000e+00 0.000000 3.400000 0.000000 0.000000 0.000000 0.000000e+00 0.000000e+00 0.000000 0.000000
25% 8.328250e+03 721.000000 0.000000 4.935000e+04 527.660000 13.600000 32.000000 8.000000 0.000000 5.932000e+03 1.296100e+04 0.000000 0.000000
50% 1.464200e+04 733.000000 2.000000 6.210500e+04 840.545000 17.000000 32.000000 10.000000 0.000000 1.104200e+04 2.206000e+04 0.000000 0.000000
75% 2.520800e+04 741.000000 5.000000 7.960000e+04 1249.942500 21.800000 32.000000 14.000000 0.000000 1.932400e+04 3.681000e+04 0.000000 0.000000
max 1.000000e+08 7510.000000 9.000000 8.713547e+06 22939.120000 70.500000 176.000000 76.000000 11.000000 1.731412e+06 1.763322e+08 7.000000 11.000000

Split my dataset into a dataset for the independent variables and a dataset for the dependent variables

In [76]:
x = df_cleaned.drop('Loan Status', axis = 1)
y = pd.get_dummies(df_cleaned['Loan Status'], drop_first = True)
In [77]:
x.head()
Out[77]:
Loan ID Customer ID Current Loan Amount Term Credit Score Years in current job Home Ownership Annual Income Purpose Monthly Debt Years of Credit History Months since last delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Bankruptcies Tax Liens Null years in current job Over ten years
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 11520 Short Term 741.0 0 Home Mortgage 33694.0 Debt Consolidation 584.03 12.3 41.0 10 0 6760 16056 0.0 0.0 False True
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 7055 Short Term 742.0 0 Home Mortgage 90711.0 Debt Consolidation 1322.87 31.9 11.0 21 0 18290 19752 0.0 0.0 False True
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a 99999999 Long Term 694.0 8 Home Mortgage 110549.0 Debt Consolidation 1824.06 21.6 13.0 18 0 40907 52513 0.0 0.0 False False
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 2745 Short Term 743.0 2 Home Mortgage 34318.0 Debt Consolidation 506.18 16.5 68.0 6 0 5429 10501 0.0 0.0 False False
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 3008 Short Term 750.0 2 Rent 37104.0 Debt Consolidation 556.55 13.8 34.0 11 0 732 12621 0.0 0.0 False False
In [78]:
x.drop('Loan ID', axis = 1, inplace = True)
x.drop('Customer ID', axis = 1, inplace = True)
In [79]:
X = pd.get_dummies(x, drop_first = True)
In [80]:
X.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 215698 entries, 0 to 25243
Data columns (total 26 columns):
Current Loan Amount             215698 non-null int64
Credit Score                    215698 non-null float64
Years in current job            215698 non-null int32
Annual Income                   215698 non-null float64
Monthly Debt                    215698 non-null float64
Years of Credit History         215698 non-null float64
Months since last delinquent    215698 non-null float64
Number of Open Accounts         215698 non-null int64
Number of Credit Problems       215698 non-null int64
Current Credit Balance          215698 non-null int64
Maximum Open Credit             215698 non-null int32
Bankruptcies                    215698 non-null float64
Tax Liens                       215698 non-null float64
Null years in current job       215698 non-null bool
Over ten years                  215698 non-null bool
Term_Short Term                 215698 non-null uint8
Home Ownership_Own Home         215698 non-null uint8
Home Ownership_Rent             215698 non-null uint8
Purpose_Buy House               215698 non-null uint8
Purpose_Buy a Car               215698 non-null uint8
Purpose_Debt Consolidation      215698 non-null uint8
Purpose_Educational Expenses    215698 non-null uint8
Purpose_Home Improvements       215698 non-null uint8
Purpose_Medical Bills           215698 non-null uint8
Purpose_Other                   215698 non-null uint8
Purpose_Take a Trip             215698 non-null uint8
dtypes: bool(2), float64(7), int32(2), int64(4), uint8(11)
memory usage: 24.1 MB
In [81]:
y.head()
Out[81]:
Fully Paid
0 1
128265 1
128263 1
128259 1
128256 1

Train random forest classifier, gradient boosting classifier, logistic regression, and k-nearest neighbors classifier models

In [82]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,train_size = 0.8, random_state = 0)
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\sklearn\model_selection\_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
  FutureWarning)
In [83]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators = 20)
rfc=rfc.fit(X_train,y_train)
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:3: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  This is separate from the ipykernel package so we can avoid doing imports until
In [99]:
from sklearn.ensemble import GradientBoostingClassifier
gbc = GradientBoostingClassifier(n_estimators=50,learning_rate=.1)
model_gbc = gbc.fit(X_train,y_train)
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\sklearn\utils\validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
In [90]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
logmodel = lr.fit(X_train,y_train)
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\sklearn\utils\validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
In [88]:
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
scaler = StandardScaler()
data_scaled = scaler.fit_transform(X_train)
neigh = KNeighborsClassifier(n_neighbors = 21)
model = neigh.fit(data_scaled,y_train)
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:6: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  
In [91]:
rfc_pred = rfc.predict(X_test)
gbc_pred = model_gbc.predict(X_test)
log_pred = logmodel.predict(X_test)
test_scaled = scaler.transform(X_test)
k_pred = model.predict(test_scaled)
In [92]:
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

Examine the accuracy, f1, recall, and precision scores of each model

In [101]:
pred_list = [rfc_pred,gbc_pred,log_pred,k_pred]
for i in range(4):
    print(i)
    print(accuracy_score(y_test,pred_list[i]))
    print(f1_score(y_test,pred_list[i]))
    print(recall_score(y_test,pred_list[i]))
    print(precision_score(y_test,pred_list[i]))
0
0.8512517385257302
0.9140423023857045
0.9676120249574589
0.8660929606782931
1
0.856583217431618
0.918993937965618
0.9953204764605785
0.8535398983388865
2
0.8530366249420491
0.917407050363461
0.9986103233125354
0.8484169437617464
3
0.8543347241539174
0.9177788245145758
0.9946681792399319
0.8519238243295764

The gradient boosting model was selected as our predictive model since it had the highest scores (accuracy: 85.7%, f1: 91.9%, recall: 00.5%, precision: 85.3%)

Create a csv file to showcase my results

In [149]:
results = df_cleaned[['Loan ID']]
results['Customer ID'] = df_cleaned['Customer ID']
C:\User_Files\Lucy_Wan\Programming\Anaconda2\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
In [150]:
predict = pd.DataFrame(model_gbc.predict_proba(X))
predict.head()
Out[150]:
0 1
0 0.178703 0.821297
1 0.100057 0.899943
2 0.009637 0.990363
3 0.141215 0.858785
4 0.113016 0.886984
In [151]:
predict = predict.rename(columns = {0:"Probability of Charged Off", 1:"Probability of Fully Paid"})
In [152]:
results = pd.merge(results,predict, left_index = True, right_index = True)
In [153]:
model_pred = pd.DataFrame(model_gbc.predict(X))
In [154]:
model_pred[model_pred==1]='Fully Paid'
model_pred[model_pred==0]='Charged Off'
In [155]:
results['Loan Status Predicted By Model'] = model_pred
In [156]:
results['Actual Loan Status'] = df_cleaned['Loan Status']
In [157]:
results.head()
Out[157]:
Loan ID Customer ID Probability of Charged Off Probability of Fully Paid Loan Status Predicted By Model Actual Loan Status
0 000025bb-5694-4cff-b17d-192b1a98ba44 5ebc8bb1-5eb9-4404-b11b-a6eebc401a19 0.178703 0.821297 Fully Paid Fully Paid
128265 7f99634c-cc18-4667-92b4-b9e887b7c296 e52aac79-e3f0-44cf-b538-539c0a01c849 0.104818 0.895182 Fully Paid Fully Paid
128263 7f99378c-b36e-4711-9af2-33c7239c00e7 32bdbd23-ca78-4c14-a0da-afc2a4e7a33a 0.089349 0.910651 Fully Paid Fully Paid
128259 7f96d054-5de3-4f5f-9a21-d54ee9af703b 784a48cb-96ee-494c-aa7b-eb388fbf3003 0.009822 0.990178 Fully Paid Fully Paid
128256 7f9558b3-4ed7-4fe0-8662-8021c53c0016 0971186f-b0a7-4d11-ba55-4b41348673a2 0.145427 0.854573 Fully Paid Fully Paid
In [118]:
results.to_csv('results.csv')
In [103]:
features = pd.DataFrame(model_gbc.feature_importances_)
In [104]:
features['feature']=X_train.columns

Examine which features were the most important to my model

In [105]:
features
Out[105]:
0 feature
0 0.203020 Current Loan Amount
1 0.444981 Credit Score
2 0.000000 Years in current job
3 0.173322 Annual Income
4 0.040899 Monthly Debt
5 0.000000 Years of Credit History
6 0.008935 Months since last delinquent
7 0.003361 Number of Open Accounts
8 0.000000 Number of Credit Problems
9 0.004552 Current Credit Balance
10 0.006288 Maximum Open Credit
11 0.000000 Bankruptcies
12 0.000000 Tax Liens
13 0.028454 Null years in current job
14 0.000000 Over ten years
15 0.054749 Term_Short Term
16 0.000000 Home Ownership_Own Home
17 0.031438 Home Ownership_Rent
18 0.000000 Purpose_Buy House
19 0.000000 Purpose_Buy a Car
20 0.000000 Purpose_Debt Consolidation
21 0.000000 Purpose_Educational Expenses
22 0.000000 Purpose_Home Improvements
23 0.000000 Purpose_Medical Bills
24 0.000000 Purpose_Other
25 0.000000 Purpose_Take a Trip