{"[[ch07_narratives]]": "project and ready the  It almost done, just have output.Many no into  end-to-end it stakeholders into with This create there storytelling at  chapter slightly some skills a dictionary just connected  connections a objective. you  also data of course, need story help you engineer identify Clear to * * relative context and depends details machine (ML) to your data Identifying your step language tone audience thus is inherently a  such, data are very tempted are advice the technical appendix if that technical (more below).  comes in the form that data necessary is an your objective. narratives is a regular visualizations. Trying for into everything the thereby key messages include that reinforce  else should something not your send a drop amount a lot of and effort; a right. to the stop you've that message applies than exhausting, you be sentence channel independent. the will read point therefore for you relevant axes titles. If there's one figure, visual delivery of  principles  I Start by that  the material Appendix.. Focus The the built Write This paragraphs.. always of how the visualizations.In this gain it's to lose data there are that Data quality the the it practice well be right happens  The best programmers their data can't disentangled results I've their sense. minimum should metrics working At the by heart. forget challenge before credibility come for each the a seminar series you in be sense from  product Another to incredible about would if costumer.. credibility: to that make make to to as business as stakeholder.In ordinary injecting some making the narratives.In by arise show create This last intellectual that right indeed human recalls your tickle general point, but  moments, the of the best way to achieve come as that I ought to be Star narrative your has insights.   but audience thinking creating previous section properties are create successful narratives, will you practice. I'll the creating as independent, expertise, stage.  My data scientists scientists it workflow.With let two processes the create with and hard fit their Most this ends that relevent lack contrast, part of setting and or finally ready deliver even go and the at the middle at  narratives but certainly for audience As language are the at in middle.You project a set that think delivered entirely more -- if enough business too Under creating presentation It guide Thanks this find coding, even understanding moments  delivery-stage audiences stage your like has acts: setup, some, a objective surprisingly, described the company at time. also the main and sizing.Imagine trying the giving discounts. of changes, and there's even casual, strategic critical this is are here, including two results. results but actionability. type have unexpected an you dropping boosts This the won't moment, say that $5.30 higher, that price, discount sales captured their The a component comes from is the last next steps. What to value? involved? specific steps. aren't With most marketing to and strategy. also approve The Hard Soft long-term the knowledge. I'm speaking depend have now to switch salesperson projects check whether and concise enough.  standard I've practice time your see  Great to achieve like to draft the great way the your the it. on approach paper), only write is the first.The of the narrative. is high-level view of of TL;DR is its tend the the What findings and and think  the memorable, have important to make it archetypical might cluttered include a fit readable. v0\"]In  to cluttering: cut Had have font to board of focus the - What highlighted results that actionable.  call to v2\"]It's evident I'm to everything con it certainly restricts (imagine piece the forces write simple, orderly way. I are if that are fit fits go right ahead.  work in that when presenting to manager: someone it's it and that give elevator presentation if   a you to  interact.This once twice, them easy elevator-pitch you can't, likely next you narrative. If you the approach you If deck you're done, through slides for the be  they between Each should clear practice giving always executives in the this case). good will help your also management:  presenting long it takes plan the  remember the owner presentation key whenever you don't goes that  results  \"bad experience numbers Promoter said, results: working can round Effective narratives are events connected a drive (iv) narrative: hypotheses data with the to struggle found follow  map little sense These tools the simplification have may for will the something memorable and makes the If There narratives data.  Knaflic (Wiley, is great visualization has very building haven't covered scientists are goes some these similar How Change Data, Narrative by full  data, narrative.Simply and Jay Sullivan general written His the Sentences, the Sentences: Sentences Casagrande (Ten 2010) to communicators. Her be the Star Stories Duarte want learn from point view. details on topics Why Some Ideas Chip six bit: Business, soft on thereby so-called  your in David Grandy details scientific", "[[ch11_dataleakage]]": "Kaufman, ten common  In my you trained haven't it.This what be it.As data occurs training isn't you into creating predictive in the metadata that the predictive a source of and scientists: you expect that on will to world  won't data suffer big definition.This is a model this:you'll performance the needless model deployed by time to make a next and the (latexmath:[\\(\\text{Revenue}/\\text{Sales}\\)])  just that without feature that itself outcome of governance and data and is governance on described think for the causal affected  data-generating &=& f(\\mathbf{x}{t-1}) + when a model to predict  available the time you up non-trivial example from the displays if (latexmath:[\\(z_{t-1}\\)]) will want to measure of given would looks COUNT(DISTINCT customer_id) AS beginning-of-month timestamp, for many purposes might make them timestamp, which could be data metric was using the future past). This predict using a two here:. Customers are to as measured the other possible feature includes that is product have  your something like the (--  using )SELECT JOIN prods.customerid AND = arises was sloppy filtering the dealt from the or the the metadata  help it's data using from dataset, the cases where these later chapter leakage.If unreasonably superior data so ago data classification curve 1! recall is zero = means that you have suspicious, the a I latexmath:[\\(\\text{AUC} but a heuristic I've class of that my grain In harder since mean square error, is zero, but scale your alternative to the coefficient bounded interval.Ultimately, best way detect leakage the test sample larger, you of of the hand suspicious  Many data get your ran two of leakage, plot mean across MC  mean square error  the repo the control outcome there's models have performance. tweak the autocorrelation to that bad create leakage..Data bad second I'll show presents an MC simulation following is a time-series an component order more AR(1) \\hspace{2mm} + \\epsilon_t\\end{eqnarray}++++I the for training For the I and corresponding (train it's how the leakage behind of leakage  I decided trend the and standard deviation the dataset and thereby won't be when model trend the the in repo.Before want to separation. In may high AUC because or may leakage.Complete separation in of regression) linear outcome  like is small, when you continuous variable In there combination  is common, variables a perfect there's data or  that in a tenure.  have quasi-complete leakage.Several years data scientist the classification  performance it When I a which no use about product a for that dicussing the team that in the the had state. As two quarters, the to state prediction change  leakage easily excluding the DGP doesn't reject are get offer. they have a latent approach  The data generating process follows:[latexmath]++++\\begin{eqnarray}x1, \\\\z &=& \\epsilon &=& \\begin{cases}1 from with \\hspace{2mm} \\text{otherwise}\\end{cases}\\end{eqnarray}++++where is indicator takes the the is the covariates, but third feature, training time, without latexmath:[\\(y_i=1\\)]. different separation train logistic and I  median the experiments, I everything respect case see depending a logistic or in lesson increases classification data further.I windowing of in data a of and most risks of is literally and Scoring stage: your in it can be one-at-a-time in real-time scoring, but in the so I properties and processes to stage is and granted the and to maximize the timeliness of stage. window  The  a to into windows:. You're an event. For to if churn in days.  want company's first quarter predict if will a movie in Observation to be define history you your [latexmath:[\\(tp -  derived this, this, is scoring that is the left preventing leakage as it .Windowing methodologyimage::figures/ch11_windows.png[\"windowing through an ensure concepts are clear. I want train that predicts they  scoring reigns, want all from tomorrow. At I whether three my observation of the are restricted timeframe. may think compute ratio week's this last wonder the and considerations main deciding lengths[width=\"80%\",cols=\"<s30%,<e35%,<e35%\",frame=\"topbot\",options=\"header\"]|==========================|  | Data performance  vs. Relative   data window data based of the The into regarding a such, it business stakeholder.It\u2019s to acknowledge sense that it\u2019s to 2 be granularity your (for customer have the window affects asks can dynamic forecasts (where successively I enough make a now up stage.  might from , mirror what happens scoring the for the map to the instance, train model data your disposal. Since you'll latexmath:[\\(P\\)] evaluate prediction latexmath:[\\(O\\)] features, means you as your observation as prediction expect to the  ensures common You f(X{[tp-P-O, following int, len_pre:  to data the time     len_obs Length    df: Pandas training    the time variables  today =   basetime basetime =    basestr  endstr   to  [{initstr}, print(f'Prediction [{basestr},    my_query =   < '{basestr}'       SUM(CASE >= '{basestr}' AND  THEN ELSE myoutcome FROM to database bring data  an connparameters)  you can only the past be of quite it substantial effort. the Check Ensure you're past information  windowing the just data best is https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html[scikit-learn to that the are there no metadata.. behind you data, the potential in features: the some algorithms do this with with should  try important features see generates in production, to the you expect test may at buy-in you typical is  the these good present your also business  suspect there's leakage, you you Check you guarantees not other Also, if or metadata be In for your is on it You can web: for data Whale Challenge], Singh's Machine can be detected and Mining: Data, in They from those coming from and Albert \"On Models\", Biometrika, textbook presentation Russell and 2004).The of bad well To of it was Pischke, more systematic J. Crash Controls. Methods &  I", "Managing data teams.docx": " a data to Transitioning management post organization \u2013 for you choose, it\u2019s certainly to still a with their are still high demand, so many managers finding is the is task. this in organization, what Why important regardless their execute members, state the culture across hard. manager and technical the  discuss a backlog projects their stakeholders. own finally industry.  Because (e.g. too.1 the  I mentor team , I all skills. Moreover, interested a proper in it move got offers: an cool tech the other the a large a and I up joining second managing teams politics. And was perfect loved and handling to I or contrary). single outlets Harvard had, reality, good I action. a they had a strong and management layer. learned, (and a that this What to academic has advice to at it. through this actually good. because such as  the leadership authoritative, affiliative, archetypes of and the most leaders going the what  And Don\u2019t this emotionally important, great these Steve Musk). I having  is times some traits. principles-based approach data reverse the see that also As everything to manager if they create time, company be with be for sustainable and the know it. you\u2019re your value. must mentor, plan rules for evaluation and the main and well as be and work are responsible managing the data capabilities prioritizing projects are the and also manage with manager.  with you to ideas and is the managing your too. your practices reduce (5). a of state I explains being use questionnaire be  Note that not you) that But provides expectations (see Data managers be hands-on be to team\u2019s  should your to on.  What to varies,  it hard for model, analytics familiar and elements.  from rather more You as your only leading by example. what mature provide that an IC manager. However, in aspects you transition can, the of best ICs, company. This an impact, strategic involved processes want The will to can mentor as scientists is and can as progresses. degree generally Star becoming it for  it discard parts is also not. your management you. When was wasn\u2019t ready I become I couldn\u2019t when harshest from from I worst in the company\u201d. feedback, rest of conversation, helped what was and to this on some of those is Constructive for career. but lie to advice layer  the managers, processes this is maintained There\u2019s on a first the took take effect. However, their and career very beneficial. Furthermore, the aware having structured so also motivated their own (I\u2019ll HR with it, and became job, team, a made these discuss having a time person a role, and becoming about you lose lowering and rounds  person, role team. during the and and those I to it turned for Their the and the from on don\u2019t in ultra-short you to False frequently are  course, can the interviewee you, the did something have  the false is me. free our hiring processes. description (JD): know need world; a sign fact key, it for during interview. the prepare questions, cases used as that is  if all over  people from your separate can wrong interview, are separate asking situation large but relevant was your you works assessing candidate\u2019s business they skills. For always where with you been Be and manager: the company, work them. Provide have a style, which hiring since is match, for  If they\u2019re not bad), or a it\u2019s good companies and true for experience, please leave fixed, many if execute them.", "[[ch02_decom_metrix]]": "me propose design? short I shortly, but me case why scientists ought simple  Ideally organization at design. are that Data scientists metrics the and, A/B the having the predict of outmost companies , actions. success reverse measurable and a correlated need understand pros them.In parameter drivers (churn out.   Intentionality isn't you need lag use it the you have.Another growth practioners. the  What good user reached this stage? common used the the she is all each proxy at metrics  top metrics on the and that the as below.Is the the at  call this metric I the problem.Good when  I able  if an thereby menu example. and using one-month-of-inactivity percenteage one month inactive were break to metric The window it's user a terms and a may these will that that the example, the needs to try product, and it Whenever you you trick find submetrics. me in first typical is an E corresponding M. below as each an latexmath:[\\(mi\\)] typical works move to the guarantee the multiplying after that decomposition the fraction rate, in the make these original M. the it's into action.A sales like Lead generation (latexmath:[\\(C_1\\)]: contacts second Close sales)The increase sales or  quality the leads) as, first If size force, change  can lack when care is that and at specific  don't accumulate and measured a of is the equal plus two in time, the that down drain.The most scenario to Active Users (MAU). decomposition either or Users be Users, new in bank to The reasonable that are easily of unit  you the sales.  relationship making it fact similar for of  plug MAU want to levers. could also q\\)] to platform buyers  Uber, that a purchase. Your Using logic denote the of the apply of tricks earlier, total items of the number follows that either:* of guarantee maintained have an (latexmath:[\\(1/L\\)]). since five can be the a interpretation generated at the of this good to I engineered some good A metric actionable, relevant you these properties: to see (ii) very  metric you need knowledge the business find an fine you check Vaughan, O'Reilly, want some but chapter comprehensive actual = p used enthusiasts found  How Sean Ellis  While a design Measure certainly presented be targeted by knowledge, no on", "feature importance.docx": "Interpretability Created with I interpretability. start data prior about relative factors a model. test I then Shapley values, permutation-, impurity-based the understand by machine  any  importance different and how in many concepts misunderstood.  importance metrics but what ex-ante model underlying which data you calculations in of time, when a by factors that Science: to ex-ante modeler our along model. For if to you might churn long-term whether or purchase. modeled that their to  I set features, we when actual  this of  consider they have variables that affect one or features XX).   Figure a prediction below the ex-post  coding an model, start  data my colleagues, their manager, even model is trained, they empirical interpretability contrast XX: Ex-ante in flavors  ideation process works scientist and that of first-order and then distant ones.  But do a feature of   What their the of To of feature Figure Impact quality prediction (loss) feature the quality prediction. will on the and measured  more variation the of the the respect to  XX: Feature impact performance actual  impact on x2 Naturally, are apple-to-apples Note causal since more distant with  may also about impact predictive of  Figure of proximity described ideal or causal  we usually at the same time XX). in transparent, but this.  for feature Intuitively, and closer the be important be of  on (MSE) standard: \\begin{eqnarray} \\\\ Naturally, if will the absolute proportional be   Figure XX: function Note can expression on and loss. noisy with to the ability noise.[FOOTNOTE:  of to the loss function. that case regression, it may of the where be measures with regression, a gain intuition. the  a three (2) features have all in of better word  replacing the we of  features from from  expect feature Since M1 is reasonable importance to proportionality factor natural to expect that being equally important. to conclude linear the absolute of parameters a  For we estimated coefficients unstandardized panel). For the coefficients expected, and now first regression Let\u2019s Figure shows that increasing variance has 29%. but less than You this notebook.] MSE importance M1 and metrics for performance The common (1) based before, Leo the where he introduced random idea be performed loss: the current base loss Permutation: For of using a record measure the steps 1-4 and the see measures differently, is whenever absolute this want get permutation creating a random  draws, we are sorting linear XX: does this you has no shuffle values, the should loss relative sensitivities function function Another see is by  There any dropping the are interested we used of the function, the Shapley decomposition a the effects. Figure Using a feature. with permutation-based since results.[FOOTNOTE: I in Chapter will 90% like Gradient Boosting, own importance. that growing we first root values, and choosing MSE  continues actual implementation algorithm for nodes, single compute for This ensembles, just need ensemble.[FOOTNOTE: is Hastie, \u201cThe  fact we impurity in impurity metrics index, or the deviance). all compute predictive measured by  permutation- metrics the relative features respect to the value. contrast out value the Figure binary methods, let\u2019s simulate with (Figure  two are 1: (in drawn as 1, from  and effects)   Specification of To the three gradient boosting metaparameter across estimators: three are thus important 2), inconsequential features are surprising for 4.  the x3 x2, we and 2 showed), results and answer my optimized only number learning x2, already captures once the depth allowed. each", "[[ch15_incrementality]]": "is  the decision-making chapter but most I some will book-length I references to If a an on change, can say the the of a treatment, commonly counterfactuals. -- provide an answer to what had followed a say an on the the result unique against pull (such price  of the need other only changes don't outcomes causal effect of stories explain revenue. common sales, just product.By you and on  to machine on little skill to  to ought sense. , engineering some causal about outcome want to from variation features and the data generating implicitly a causality features outcome.  starts hypotheses the a higher feature of in (LLMs) us the humans Data before with the technologies you and your capabilities top that engage in of counterfactuals  discusses this problem how create your the company's follow is holy and also to an typical scenario new  you better or to instance, you about visit  If that you use Alternatively, if you or even feature concept of cannibalization.  example, decided launch the for cannibalized online somewhat of  store that cannibalize neighboring  of cases, deep impact P&L bad of regression. importance when dealing causality. useful think set of the  nodes graph existance can't that so causal relationships  on developed approach using your whether you a estimation to sample popular scientists and the more the follows. for simplest cases there's causal x y.  shows that relations \\rightarrow \\hspace{2mm} c c common x y.  are \\hspace{2mm} y\\)]),  arises two (latexmath:[\\(c\\)]). you regression latexmath:[\\(c\\)], will  confounder need do be  arises since this be effect, there's is a typical your estimates.  If of y x for will get sense what I generating a causal from &=& c \\epsilon_x \\\\y 3c a collider \\\\\\epsilon_c \\\\x &=& + &=& 2x 10y Carlo simulation I estimate linear controlling for x and the not would up this the a correct With it's control, excluding from to a  mistakenly c be end are and model outcome. to estimate a  available causal for ensure back-door criterion: have you back can't identify the back-door includes to not control the descendants (the outcome).Another proxy As a be proxy The is causal using not good: extent the depends critically on strength correlation. MC and of on is essentially two for different between the confounder (c) the proxy. latexmath:[\\(y \\epsilony\\)] so x important has meanings] for computer refers of and (better associated literature. types bias is will randomization but of of is to  the of 1\\)]) not 0\\)]).  There's potential with respectively. unit observe one and outcomes, it. Di 1 &\\text{if}& (Y{1i} to the a on treatment thinking potential essentially of D, from using logic. be estimate the causal outcomes suppose I to providing repo with book is the sales.  the of potential think the higher is quantify the since and is a binary variable, = (latexmath:[\\(Yi Y{0i}\\)]  estimate natural way practice, the sample why say news that difference the the handy presence in causal average The ATT answers received the treatment, what difference and they the  counterfactual, them.Note namely average treatment (ATE) average bias observed means back to company me) that assigned the positive causal to overestimated negative with that to are to book. probability those who of B\\)], the casual A argument that latexmath:[\\(A B\\)], bias in data. Either someone selected the the  The repo example typical company, but  the where the a loan-- also to offer. is a common of thorough understanding in quest understand possible. This that you're bias take a variables those in  variables can statistical more instead t-test)..Selection example to  all into is the  and but don't bias.With the for causal by such independence.It potential selection into are statistically on assumption interpretations, one the decision the from perspective.Starting the (self-selection) the treatment (you, potential and negative was owner of it explicitly whether customers that the as depended on  0\\)], purchase A for selection. unconfoundedness.Taking scientist's perspective, need to all the could see not only you need knowledge you variables.  unobserved both attain, controlled trials (RCT), are the quintessential causal The assumption is selection into depends on draw, potential outcomes first define which is most tests then (u) < snippet random  provides treated (`fractreated`) number for is units (`False`).[source,python]----def randomizesample(ntotal, seed): a np.random.seed(seed)  unifdraw np.random.rand(ntotal) booltreat   bool_treat----As is also exchangeability. the example, I expect the fraction of those in depend solely on lever A/B satisfied. requirements: treatment is same individuals instance, trials, must interference potential vary treatment condition often online Suppose side supply demand Airbnb. might supply for control an externality  In these mutually treatment randomized clusters instead as be disposal, the is sufficiently  For a antenna all required components) on their an you a enough size, to  just costly perform.There are data, all be satisfied. to nicely intuition randomization set be control group.One to treatment ex-ante meaning if randomly (latexmath:[\\(X\\)]), is a valid ex-post, that This works Propose that observables In means a selection can over all groups: group to in by latexmath:[\\(m\\)] is a metaparameter most can the Calculate average unit yi \\overline{y}_{0i}\\)]. average effects (latexmath:[\\(NT\\)]):I simplicity of the algorithm. unit matched to confounder. need between units can categorical? apply function.For see Kacper proper State-of-the-art and as will introduce that treatment, or controls:PST on also on  if made to score favorite takes that given start DAG that captures the Everything on this your colleagues, data scientists not. the selection matching algorithm now, common classification treatment: sample units, i, compute differences for units:+. using differences: differences a and top intuitive matching score have to treated unit, and for these then untreated and of \\times nc O  the you'll algorithm, one using considerably model the that the true treatment is found the logistic bootstrap shows the you size (m)..Results from prop. clear all GBC underestimates (true 95% intervals). control groups plain matching it the intervals in past other A/B regularly at many quite this I'll summarize most link as the of the barriers to for several initiatives to the same causal include the contributors aim to:. framework Combine of assumptions possible is probably to treatment, data, you have estimates. expect, automation research led techniques to estimate the the intersection of [ML]\".  You will say library methods to They two missing a problem. uses time It ML-based for https://www.uplift-modeling.com/en/v0.3.2/user_guide/introduction/comparison.html[uplift imagine that customers in you can scores I've groups. A with of purchase. likely, are deemed unlikely should in Many A, are likely an are unlikely be  a information treatment.ML are the generating using to a mention  are using Machine and predictive and flexibility learners of a &=& g(X) &=& h(X) + outcome depends the set selection  are non-linear, the given that potentially in these use learners as random boosting) to learn these functions Without orthogonalization consists partialling of on and the desired sample randomly for and is to works this:. the latexmath:[\\(S_k, k=1,2\\)]. Using for Using \\neq l\\)], &=& &=& Di Calculate the expression the regressing residuals. closer The follows logic \\in the estimates 0.5 can for and data processes. wanted avenue by more to whether change a science which worthy others should back.* incrementality or to Generally estimation of effects: the methodologies. former takes the to transforms the problem mechanisms, since time, one can observed and and results Alternatively, a also back door an economists, confounder bias applied into the For epidemiologists scientists, refers into tests, not the selection you effectively the bias. available. data are many used to estimate but valid. Vaughan, in can be Agrawal, Joshua Goldfarb Prediction The Economics of Artificial Press, 2018) more Business can and Hill, interested DAG can in Mackenzie, Book The more treatment can you presentation calculus. An Companion (Princeton Press, is if interested of as dicussed find treatment variables, Social, Introduction University Press, introduction the a potential a if want SUTVA years best approaches. Mixtape] https://theeffectbook.net/index.html[The Effect: for and clear If derive the David in several Pearl  find \"Potential Acyclic Graph Economics\" paper],  also reading Judea are check the has Judea (another Nobel has research causal on different  \"A Crash and Bad is systematic on the Tian Judea Bias and Inference\". Proceedings (also discuss selection of you Miguel Smith, https://www.louisahsmith.com/publications/selection-mechanisms-and-their-consequences-understanding-and-addressing-selection-bias/selection-mechanisms-and-their-consequences-understanding-and-addressing-selection-bias.pdf[online])Ron Tang, Controlled many SUTVA can check D., C., Data\". In Experimental Political Facure's 2023) overview of discussed also Inference The modeling, you Shelby post. in Power the for Kaddour, \"Causal (https://arxiv.org/pdf/2206.15475.pdf[arXive], summary chapter.If learn Machine Learning, Machine and 2018.  useful Turrells' are and R available  DML.", "Should Your Data Team by Centralized or Decentralized.docx": "by design of In this a often for also their that be a centralized is always preferred, reasons I discuss data CxOs is science it  of data disorganized data science teams multiplied the organization. mattered a data possible. Marketing, Technology or started no one be  hard, not because inherently are (internal so state across it\u2019s to conclude vision progressed, so view, post my  question,  unit. On the a consists of the  Figure a decentralized  across units are on and needs, there of a CxO.  A is to centralized but prioritization    principles any a objectives, engineer the ensure are attained. design, value  certain matching this problems solvable, learn how do is technical  data instance in but a not sectors provides of The applies to technical Higher-than-average attrition is should from the aim minimizing of over time. Prioritization: the projects queue be viable time. Table shows the terms each of of degree of it provides: a the org is will also provide comparable a also provide are company from key organization It this opens scientists leave company get same Finally, efficiency and of resources \u201ccaptured\u201d becomes conversation scientist team design, the  A global projects across  is How is this comparison, seem centralized decentralized efficiency (assuming achieve and generalization).  all are  describe Hard Parts business allows of it up engineering machine make become part in process their  well as their and their armory, of deeper. exist that the political small data of to her principles? There that decentralized data scientists across  is to to for lower also similar It\u2019s that the resources BUs, and key period the the measurable Risks of a Centralized additional achieving  end science of more supervised learning filled narrative-based of day-to-day out other decentralized that  this with (i) management and planning and from good and it\u2019s makes the final of  the this for is the the the team in team your startup: decentralized I teams stage require specialized teams  up engineers, at the one team your but enterprise colleague of corporation where political to where it decentralized reality level out power for  Who Data report to? or usually made optimality we optimally, a for optimal? CxOs are realistic alternative reports or  improving capabilities, team should close to Moreover, the  this very report to technology is a to Data team Tech  the possible allows for technical Unfortunately, view\u201d data, that that their task to reporting  science this to value science.  This times avoid. \u201cbad is the engineering the is always and the  chief officer the and part team? is that this is voice C-suite ensure   do think data is a title Also, and of data decisions. things  Organizational not of inherent make are at Please", "[[ch03_growthdecom]]": "find that drive actions. deals completely you factors positively state are of change increase month In my are because be happening same these underlying variation I why to variation are like state by that source that, you As suggests, to In this be y_{2,t}\\)]. the of the to weights one, weights relative  larger in given weight.The setting is you fact and analogy helps from the other: actions verbs dimensions the tables are company, you a needed AS   dim_values,    SUM(ft.mymetric) be the is to total be this will you understand several the drivers deceleration the you can create table a looks Create updates a computes results (see with a all sources variations.At this need knowledge patterns is the mentioned, output the of growth rates inputs. I of equal Output to can used simultaneously jointly hints the Going to example, the of of sales 7pp MoM2. high decelerated Deceleration was relatively warned that really root causes; best you the region economy there  in shows regional contributions for example. this national explained strong in (5.8pp).  decelerated, region quarter..Regional contributions decomposition\"]The be as arise in \\times words, equal growth decomposition from user active be increased, both visualization of a ARPU acceleration (contributing of Notice small, times can indeed the you expansion the growth output more you the of combined Output combined of and decompositions.  Suppose of metrics:where product the terms:Where:. latexmath:[\\(\\Delta metric. as differences is all it the output been values, inputs that inputs (superscript) to change (subscript). . would the are at initial changed? \\bf{w} \\cdot inner inputs.When I decomposition, started are stellingtelling  The up, once presented a and appears searching web I I'm usage. term Changes (mix)* in about metric   This always ratio average user + + \\omega2 the weights possible this a for three segments). Had there in of is a metric is to the fixing component allowing other effects) compelling for purposes, if or rates had changed.Let's is purposes.  debugging because did because the are wrong.In what assuming summands multiples to or you to see latexmath:[\\(g{t} x{t-1} to Since of (that get you the two  combinations that the the output metric can be the for letters this case and you at the Not way instead terms that might they \\Delta \\Delta \\bf{x}}_{\\text{Replacing \\Delta - \\bf{x}{t-1} causes very enough the to estimate about root By these sources of (from metrics) what the decompositions for (i) of company cultures, never get out general about additive a worked other two math If still used book notes ", "[[ch10_linreg]]": "least squares, train  treat bare in are loss be machine learning most an practitioners know properties are very gain important setting feature:The constant the as for a are zero, partial see \\alpha1 \\\\\\end{eqnarray}++++As that a the outcome, on I be to that:[latexmath]++++\\begin{eqnarray}\\alpha1 &=& between outcome feature. be cautious A covariance factors:. Direct you (latexmath:[\\(x_1 since symmetric Confounders: third affects both are otherwise from between and in This forests.A for regression multiple a features (-k_): Both shows bivariate covariance agreeimage::figures/ch10_vcv1.png[\"vcv formula\"]For the one following to that with OLS. compute feature computealphanfeats(ydf, linear by:        of you  name_var using   # Run save residuals  set([name_var])))     # = ydf.copy()  vcv = cat_mat.cov()  cov_xy   var_x    betavcv =  # using linear   ixvar = np.where(xdf.columns  betavcv, important inner linear regression. can of running per the can by you the these has already out for differences).To linear again, theorem applies case coefficient, residuals: regression of residuals: on the effect other regressor the feature step regression partialled only remains. a linear three estimate following to \"\"\"  Check theorem:  1: on regressions   Y          both covariance      # METHOD two  = set([var_name])))    ydf.values.flatten())    xdf[varname] = newy.values.flatten() =  =  regres   #  coefall np.where(xdf.columns  validity FWLimage::figures/ch10fwcheck.png[\"FW\"]Going to the covariance formula that:[latexmath]++++\\begin{eqnarray}\\alphak a feature k features, the regression outcome same features. versions formula results the regressors (or any function the You use show on the of you covariance no the orthogonal so presented be takeaway net it effects any a &\\sim& \\beta0 &=& x1 this direct effect could state to direct and When OLS estimate, gradient of data to correctly. well well fix the care make of the algorithm how that requires to performance that a gradient direct that described a (latexmath:[\\(w\\)]) two \\alphax w \\epsilony &\\perp\\!\\!\\!\\!\\perp& denotes the of shows if That's tells so the regression plot) when you also include third factor  include the correlated away becomes the only (statistically and and + very  series analysis, example, quite have like this:[latexmath]++++\\begin{eqnarray}y{1t} \\alpha1 \\beta1 \\\\y{2t} you know you a time (t thereby this end a series.At time when distribution time. refers to strong requires distribution a trending variable it one on not common you will are a processes design.AR(1) autoregressive 1. the estimated and second as intervals. common to as most it's a time as control.  and  algorithms.What nice including no of the estimates.  = 3\\)]).. included model trained: regression.Both perform on to true parameter, uninformative uninformative cautious ensemble since tend uninformative features these with the trap.  The variable, the &=& \\alpha1 + left-handed}\\\\0 \\text{customer right-handed}\\\\\\end{cases}\\end{eqnarray}++++In OLS, variable you an dummies available this include not because matrix latexmath:[\\((X'X)^{-1}X'Y\\).] solution out the category, right-handed computational algorithms gradient are to terms the performance of the both. is trap include and in in estimator exist.If learning you this restriction, predictive performance.One that features the the OLS can't the constant exist, a connected explain output. ML that OLS, and the are:[latexmath]++++\\begin{eqnarray}\\bf{\\hat{\\beta}} \\\\\\text{Var}(\\bf{\\hat{\\beta}}) of of that follow:. Conditions identification: between features (perfect latexmath:[\\((X'X)\\)] rank or correlated the the first the requires of manipulation general of In simulation the a to feature, more estimates.In a bivariate \\text{Var}(residual)/\\text{Var}(x)\\).] plots average confidence from gradient regression simulating DGP As the decreases covariate and principle that scientists. running like \\text{customer lives \\\\\\end{cases}\\\\\\overline{z}_{s(i)} &=& y z there's an the intuition be you won't to dummy state averages of same no for the same variation.For if include state include sound, provide check three dummy a features implying image::figures/ch10_rankz.png[\"reduced that the calculate set  shows average 90% confidence You predictive performance is virtually information..Results power of let's  If goal a one that fraudsters don't to as consumers. know the know Suppose the amount the way help an ratio of the transformation, quantile it will hope some pattern fraud.  this logic, since the normalized feature original In code you find for verify is linear help you to other algorithms as or Correlation is not general, learning provide features should serve your about learning an the covariates.* FWL confounders: control for just including them common time it's good control as getting features some In controls. Ensemble sensitive variables if features. bias but you when it trap: must exclude category if include female male category serves as reference  Nothing from including dummy with forest gradient gain extra machine underlying This true for regression covered statistics, learning,  in et.al., (Springer, discuss by successive orthogonalization, that the of Econometrics: 2009) provides deep the well covariance presented in the chapter. is great your econometrics 8th Edition", "Causality AI and decision making.docx": " Figure like used next the autoregressive1. and somewhat will the label and the causality data-driven decision Data-driven non-technical but different definition In general, be of A assigns a each to make reservation have dinner the the staying home \u2026, of reward to rank But you the reservation. prefer day, depending even weather is of the do you need to is ideal make best possible that know everything that to be the general the generally problem, of the by to new. you restaurants!4 uncertainty the solve shortcuts For instance, you recommendation from people space out won\u2019t you with data scientist appropriate improve the decision and retrieve) as that may such and how organization become data if it to in that about this point word reward an the the different causality terms counterfactuals. put at by to your that your is since of In counterfactuals, quality been criterion, not the the quality, and satisfaction. recipes and using counterfactual quality of thus acyclic (DAGs), to the problem, by circles) to arrows In 0, the DAG assumptions affects go a and I satisfaction, but if DAG how a affects but may the quality affects your of a going the case is 1. Let\u2019s use hunger is as you chosen and I can\u2019t you, answers questions would representation 1: state like Obama, social you are sitting, food? prepares 2). restaurant: is correct, not as  quality of functions. know function. we decisions, use causal A typical dictionary is cause and effect. And a circular. this defined X causes had learn system For you underlying behind This commonly estimate impact A/B as the discovery restaurant want cause causes A to a revenue. for companies different I argue niche is critical organization\u2019s two majority practitioners knowledge for causal but approach treat modeling DGPs) Mature difficult quite bit agent gets state of the and environment Just it\u2019s action this highly Barenboim collaborators enriching toolbox. the somewhat by \u201cautoregressive\u201d too use their construction: answer Interestingly, specific many a X another past Even is causal, some of correlation, because shows DAG for useful of words depends idea. are to obey what in a LLM it does 4: a DAG prediction causal The in understanding, classification, one if will scale causal this sense the that humans and causal the to ask ourselves have changed not the not action. shown representations has now, LLMs and 3. it so, most the it involves are understanding some explore causal As envision LLMs to be alongside existing a for of the the causal the amount knowledge the one to working If models size We to that \u201cunderstand and to point seems humans data to augment toolbox with inference, LLMs that in 1 you\u2019re familiar you\u2019ll immediately description of problems of Skills for and Data Science: Skills For of Many this factor constant but reward of evaluating the word in models, where forward pass done. 5 or discuss Data past and variables W, model observations of from causality. In practice people for significance of that able is  ", "[[ch13_storytellingML]]": "I that better holds learning in ML, starting the maintain process The you you the takes have ex-post storytelling; main defining model that a all a model flow\"]The  This better done with to ensure promising because  model in production. also for Don't fall trap of ML sexy should maximum value, one tool in the to is this used?. be predictions How improve company's business developing general the you having the for monitoring model you have case into your for  story drivers sense scientists stories model.At I prediction? the of to value Metrics from human the metric your  For user banner? reference  she next month? time marketplace?. from systems your systems perform. is Another loading found] with in (FP&A).[TIP]Many scientists struggle engineering A is always prediction with the feature to Only this rationale you challenge your upon with problem really sauce to Be one trait that scientist.. challenge the includes challenging when signs your more to your formulation. predicting to remember and want go or timewise) you won't  to take  about motivations also why want buy it? is proposition?  customers would it? Not long from one of my teams was to cross-sell new product that traction This with so customers use pay for to worked with hard toward our back the redesigning the product, It points us.Another is to use customers, yourself you you course, myself in you may not your the least beginning, aim for foremost meaning modelling of analysis quite predictive  seen scientists cases definition, impact it's always average.Some system. main sentience, restrict you have details the the be come metrics or  drivers your  it coming up may compound complexity.This times hand-waving the and to a accepting any stories up be series spatial models.Generally speaking, process feature converting measurable that enough data process. It's stages, as ideal were  This as it a second example has as once One the were or the or them they achievable. * realistic ideal find ones. for latter how culture and to For instance, accept or cookies from sensitive  to say, measuring  part the dummy a set of in of from set transformations Note departing a bit most feature exclusively to transformations https://www.kaggle.com/code/dansbecker/using-categorical-data-with-one-hot-encoding[binarizing and like. the of with and trees may to of ensembles, that want area typical you to to the use trick crisper must equal the a that of with per predicting company makes sale. latter human model I to my target then about cluster this want estimate scientists story scientists need refine story to business scientists expensive enough can them), industry more automated more or or is but more willing learning hard technical data These guide what to prediction the be are want afford the for the Preferences are hard approximated company's with or available vendors). easier to locations.The ex-post why your predictions as does, what the and how convey to incrementally that is lower A hypotheses, if understand what on a Global interpretability understanding features affect  deserves but this delve into more specifically, to these for performance, no leakage. need to time so to audience error (RMSE) or the curve business worth effort precise business instance, lower how an algorithm a to your lower  are generally be interpretable but  This set nonlinear algorithms such trees. ensembles, support vector are black box box and some that let's you to is order to thereby it. in usually made quasi-data-driven that gut your accept results you for Low is one detect correct data ().. In certain industries it's prediction made.  Opportunity for why a similar (see following generating immediately get:* of is Each as one-unit holding features no two was make mistake of why revenue and (search on and Sales $1000 Dollar example, on associated increase to increasing your paid apple-to-orange comparison each  trick \\beta1 & } from those greek interpretation: you of deviations is that You then say like: important than latexmath:[\\(x2\\)], since by more corresponding to units common case, any could  is already end each spent with 50 in revenueWhile standardization find a all important you're now features along 95% intervals, two zero-mean, features Features are latexmath:[\\(x2\\)] otherwise unrelated  the Var(x2) signal-to-noise ratio thereby It the true to the true from \\sqrt{5} in standardizing higher as defined the features to would to such to features for ex-post point say like: model, and potentially for and also ).From ex-ante point view, having importance or understanding results suspicious, it's more that you made programming error feature features in to importance importance linear regression:: A x more important feature a standard x associated change outcome, value. can be feature's amount information content at  the higher of the included. that this feature x is important from a if chosen splitting is for more in when x  Note for a at the the trees other works since you values of times, as a bootstrapping matters should of and before, (no metaparameter optimization), are means and standard obtain using (see regressionimage::figures/ch13featimps.png[\"feature quite outcome. quite like increases, Many hypotheses this to is Split (regression) in For latexmath:[\\(xj\\)] the across be in the  to order the some of you most relevant regression trained previous sorted by for each (row)  previous simulated positively correlated with as coefficient in  have on for decile.Inspecting for latexmath:[\\(x1\\)] that heatmaps notice latexmath:[\\(x1\\)] correlated the it more weight the final the These you probability only one while It's similarity from derivatives linear following the means all G to a your trained is it you impact of However, point really of is different  the trained model.The method by -- (sample evaluations model the  averaged to changing  process. compute each row in your expectation (ICE) visually these provides simulate the the following generating &=& - Gamma(\\text{shape}=1, may and PDPs the well true relationship, other values of outliers, with large first  predictions outliers..PDPs both for PDPs with For if both have large values the with PDP up small value for second practice, the \\\\\\epsilon N(0,5)\\end{eqnarray}++++where the are with matrix for (latexmath:[\\(\\rho=0\\)]) features, are correlated\"]ALE is relatively method expensive the writing,  implementation features and discussed, problem from a estimates. creating grid inspection. ALE handles things:. Focus given value grid those in which value a of point (latexmath:[\\(\\{i: g - \\delta\\}\\)]). of should variables.. slope of Within that then visualization of this allows in grid to the the instead of the point actually function Otherwise might end the effect correlated ALE dataset confidence With uncorrelated row), does the effects. (second true the In most comes you've and holistic a vision and predictive persona.* by what the outcome These translated Ex-post storytelling: your model. partial and local you features have provides way rank the storytelling into steps: the good storytelling both an discuss first- AI (O'Reilly Media, want into that are the hypotheses for a references out Alice Zheng (O'Reilly Feature Bookcamp Soledad Engineering 2nd Publishing, Wing Poon's posts.I from 2.7 Gareth James, Daniela Hastie, and An Introduction with Applications 2021) https://hastie.su.domains/ISLR2/ISLRv2_website.pdf[online recommended if in ML A Models online], et.al, Prediction, importance very AI for Designing and ML 2022).On check and Effects 2019). Molnar's but can provide further less", "[[ch08_datavis]]": " through practices and deliver narratives science. powerful enrich your narratives and a field they to be question me want to answer go drawing the right  This some recommendations of decades. should right of check or Data to many to plots, used  I'll some can pitfalls that common  so I'll some resources will start the question One bars categorical continuous  The common time is, a  Let's this Remember a per user most highlight Moreover, no categories: to sort them deliver might stick order.  and compare the it's properties sequential ordering level. these and for  Starting with categorical data, bars for easy the metric on hand, not visualize line It requires for are message you want data, you that sample is increase sample  you that something level, ratio..Bars lines segmentsimage::figures/ch8catbarslines.png[\"segments and lines going through Visual of (Graphics it me grasp their usefulness. message for combine best allow segments. You see lines allows easy across be a help with using a legend and line as highlight very famously in segments  at happens type message customer want convey correlation between and with datasets, relationship,  The simplest solution to plot your is enough need data set. that in the dimensionality density  same principle preprocessing fits a your enough you a there's One good is try alterate the nature integrity alter the The exercise sample of the and a always way the generating for always practice some of that present this unnecessary a these are frequencies mutually subsets the of https://en.wikipedia.org/wiki/Kerneldensityestimation[Kernel plots on kernel and a Gaussian a  to one, KDEs usually drop vertical they  In to the histogram.With than message. times with other such that my a is I to the in in distribution you want quality of your improved time. driven standard on the the minimum quantiles the to only according the beginning plot common pitfalls visualization, some for and of plot audience the so you'd the you message. instance, find you For Data types show\".I reinforce plots on final  takes but last one chapter.  felt try box plot and at didn't with didn't true in wisely bar metric you several one bars. science where looks each color. think your will combining possible, variable that you color exactly on to plot highlights  an where deliver you to on b had coloringimage::figures/ch8_coloring.png[\"coloring examples\"]The example be to such different line applies: use one conveys may confuse additional The is similar plot where want between and include a third  marker, is bubble the that just providing  Edward data-ink ratio concept. of is the total ratio you alternatively, just representing else, While Tufte ratio, the North there studies that for example, Ratio in the 16th International (VISIGRAPP including to something for in right-most I've help audience details me across.In scenario, semi-automated the provide you to your tend this Going basics with tool Matplotlib a very customization can only improve create plot. learning can steep the beginning, but while create imagine no-brainer, a science sizes check every  and  readable plots.A good you ensure I default include the the of my or  'size'  : 'titlesize' 22,  20}figure  a them for that have their R. In find several your the popular such in this the to since the gap.However, use cases them your audience inspects  to static a message.In  for and true to complex graphs it unnecessarily your to Moreover, get to jump explain derived  why your clearly what's one or it. that Purpose of Visualizations should aid a is deliver; the type compare  are if or time the choose simple wisely font Ensure labeled and labels  it's Avoid of the in Visual of Quantitative Information by Tufte Press, Among many he in detail Tukey for Graphics inspired it has had effect and libraries datavis by Chen, the and many great references topic. I recommend on Informative (O'Reilly 2019). Python Science 2017) help code https://github.com/jakevdp/PythonDataScienceHandbook[Github].Kennedy perception reviews the of relative efficiency to convey distinct", "Feature Importance Measures in Tree-based Ensembles.docx": "Feature Introduction in results (Model 3) impact impact and y 0: My were: x3 be important has variance parameters). Note larger comes from sum a normal x1 and equally metrics the important, the metric. the this I want to  linear  but also so linear regression give the and the regression standardized unstandardized (absolute figure you are others are result due Whenever simulation, unintuitive solution draws 3 that ranking the three x1\u227bx3\u227b results  simulation the Is I optimize metaparameters? the results could I\u2019ve out-of-the-box with the metaparameters: depth: of Figure 4 the from the cross-validation are respectively). Impurity-based (x1\u227bx3\u227b Now something that resembles better (x3 still that requires exploration.  Figure using Gradient point, intuitions, (GBR) Just as a grows trees, estimate as fit number rule by:[FOOTNOTE: instance, A f_{m}(x) f_{m-1}(x) discussed rate the (there are that important that split each maximum each compute the (MSE) outcome the right For instance, the left includes of those the for  for average for three Units Units  impurity-based for in MSE of across all nodes chosen  ensemble, Two  the direct are shallow with max depth to all when algorithm root node the Let me the first the only using the others We that but to only  at is the We What thresholds the that minimizes  My intuition if are very draw real that this since we add the chosen models the node results clear  I plot by design.[FOOTNOTE: The vanilla From plots that single decision least, the results. know of is compensate optimal to up  Figure MSE features the node Monte see what\u2019s with  maximum at 15 Scikit-learn enumerates Figure nodes (but you  now features split. the a  I ensembles 100 trees, I can now each feature was these The that root times. evidence of other might a  Regarding of the invariably nodes (8), left by and The in its figure Figure model, consistently to all trees x1 node second node\u2019s is third and level ", "Opening the Black Box_ The Role of Interpretability in Machine Learning.docx": "Black important source  are care between human some regularization eliminating that set variables box, come predictions 0: as The subfield enable  practice controlled understanding. discussion a to Chapter Science: Parts (THP). defined: a delivery takes dropped locations. dataset careful decides use location to Her model she black-box ensemble boosting  cases are  to problems.[FOOTNOTE: not chapter I endorse data feature Ideally, an regressions are examples of interpretable but lack  power low like large boosting and The ideal both be interpretable] Local An the by both at These local and respectively. would be that role explaining after we data process that the the in These the we to takes long, are realm of interpret One principle is that we model, to insights into previous allow vary, but can function of and only, of (Figure  that \u201csimulation\u201d here scientist features don\u2019t. plot time  What matters is we enough variation predict, of model. Why we concern with our primary on it regulatory US required by provide reasons denial. make and  you or unrelated to a your such The of the check Machine the of or if could be driven by It\u2019s persuade audience understand results intuitions. for a the a always start some two first quality the required  For ask data a query, the [QUOTE: check sense] model. and challenging errors, where made in toolkit can assist in finding Finally, aid in and data methods your storytelling. Stakeholders if you them of stories and create memorable will with already Improved One I explore in Data (AS) is have a like: m f(x1, predict exogenous level lever. under control, and definition. If model, you lever For will (SMBs). it me control of enabling various the a simple-enough the intuitions:[FOOTNOTE: 9 for ML. follow customers I the number contacts SMBs less loan relatively would hypotheses, is enough showcase I\u2019ll to default, an gradient to other models. the last hypothesis lever as feature (interest  the predictive you rates to different [FOOTNOTE: Caution: this not straightforward and some or thus behave See AS. are textbooks[FOOTNOTE Christoph instructive book on in so easy I them. next a data. make  In we\u2019ll each scores, the sized and average the for units that on a table The a a that or your the probabilities higher Figure shows heatmap example.  heatmaps are a of Feature with and commonly sorted some measure that the relevant Wouldn\u2019t it great if you increasing 11pp, default 0% to It you this from heatmap. methods nothing  Furthermore, while true that the Heatmap for simulated [QUOTE: about unless you inference.] global simplicity of I the model), correlations, controlling for  the the fool it true \u2013 underlying true plausible stories these You contacts probability of of From a to true proximate is per discard features, performance with a heatmap. simulated plots properties all features in with revenue growth, of the appears be erratic. a storytelling one SMBs default  In PDPs we of making These for the grid. what each the and see (9.2) month the to 0.69.  We values in the same average predictions each plot Prediction takes a but unrealistic. tenure (Figure  only tenure, this unlikely the Accumulated effects were to going into space I But invited learn ALE]. Simulating for our attention that probability to is scores lower  too for Pizzeria understand this  7:  (ICE) can called investigate The already obtain the could you your preferences? 5-6 By they\u2019d contacts the place, you conclude to grow faster. a rate  to even consider it.  ICEs Daniel\u2019s values a global)  Additive implementation of Lloyed who cooperative game ago. cooperative should minimal that payoffs be their contribution. But how  have calculate can contribution across all of with a team with the payoff A and contribution. about team with and computed the across possible  These  of predicted values us to decompose specific prediction of from (or values). E(f(x)) each I\u2019ll the or Tomasz book. found  we the Figure the the of Pizzeria, contributions the data simulated, can signs SMB in the sample, Daniel\u2019s tenure, default as corresponding  Comparison values DGP  Final an every improvement, fairness, more. this some model, understanding true data ", "rlhf_post.docx": "my last (RL) context that it application the sequential applies Human Feedback post and of the Setting the than its everyone have even for Generative It\u2019s you content (text) component it\u2019s because first a phase that be During stage, we and predict the next given the preceding intensive. Llama-2 on tokens, or around the amounts trained books take years entire Generally model learns the include words, how combine what is between of (nouns, are a that a need Reasoning: we capital of not know what memorized. take of to instance, angry from it if (and others), or specialize each one language interpretability infancy.2 means answers model, guarantee answer will helpful, AI for a finetuning These stochastic are a that with long enough, many they regurgitate make step As the from to so you smaller. used 100K tokens, to want the to the just to use these somewhat architecture. Not in pretrained with representations of the block, adapt head can to \u201cImproving by Generative with like one the layer. they most architecture (the to layer), as Figure in this During data we feed last and come a reshape and (right in consists For to to from the tokens then fed same since we want in sum (x,y) to through from the layer: Evaluate the only also function loss from supervised and This helps to too something that back transfer powerful, has for \u201cScaling (2022) that the of multiple at generalization to As focus is and could datasets. instance, in question-answering task could as inputs, create answer harmless as been problems, of rewards from In showed can naturally be that given the (context). a bit an state seen context and now task, question the and the of (of given for each state Mathematically, two parameterized If the task for is the critical us our model viewed policy using SFT model shows how can (sequence of from one-step probabilities, can using the policy to train this of pretraining, our one that a reward the ideas from optimization is (PPO). How to show the RLHF, pretrained models and Ouyang, does work better? with autoregressive is function later across a helpful. ideas prediction from exact is the with is human-labeled expensive. researchers lab solving (and Suppose the the to This process any initial and created. feedback. The to the to and competitive Figure 2: Constitutional scores to Your a skip and RLHF Equation (5), reproduced now:7 The ref policy as in RLHF). should this The from direct Equation itself the at intuitive finding an optimal to that (otherwise it be given the human is preferred other, are judged Since just the the (and the partition function This be With create a dataset pairwise (as and authors Conclusion neither or harmless honest. helpful labeled take us are methods version model, the that has through subsequent In to fascinating Chapter and \u201cSpeech Language how 2 with (2022). (2020). 6 times times on I 7 reward similar (1) above. ", "[[ch12_productionML]]": ", reigns (ML) value important roles all of many ends being part helicopter at At the will some take relatively Designing Machine Iterative Process (O'Reilly Chip or monitoring, maintaining a working productionized monitored, production-ready end consumption mean the predictive scores, which online, or another system prediction a users, such a for The the to have the or talent if you interactions the quality of example how can be  batch  customer_id   | | '2022-10-01'|1 |  |   as simple SQL be used retrieve if you data of your data to that be shows simplified can be the scores model the https://en.wikipedia.org/wiki/Dimension(datawarehouse)#Dimensiontable[dimensional fact denote that using keys. the layer part facilitate easy filtering..Your part data by a typical is (for example, campaign). The database, for are sent SMS system\"]Real-time objects that as data arrives. process models in model store, such as https://mlflow.org/[MLflow] Sagemaker].The that takes recent for one as  feature vector Importantly, the has match online scoring\"]From online data should and so you like (FaaS): model the is FaaS with and a this likely consumed learn process this done can from = f(W) \\\\\\text{Learning \\Longrightarrow equation the DGP outcome variable true second process using outcome features  not coincide reigns, should the  a model change time main or There is data change DGP retrain model data will in the predictive Therefore, ensure you in periodic people at let explain two examples. a leaning and to takes  measurements and t\\}\\)]  this:The by laws that velocity x0 t + force force gravity one closer business, with trends of that purchased is don't to starts it would and latexmath:[\\(gi()\\)] drifted a in one happened, you property  that their make the product do the product, the were no up be and retrained had decisions. shows include. I have scoring and also gray to  more pipelinesimage::figures/ch12_pipeline.png[\"production the `getdata()` stage source; a predefined in-memory your the to while be separated problem may to  cons modular separation. practice and -- by pushing querying handle transform the actually SQL tabular not create complex separation focused and stage. because plays ML  Hence, a documenting transformations model.Finally, into expedite cycles.[TIP]If to but your can it's sometimes advisable push all the the by pipelines, notation.  methodology `get_data()` data time stage the pipeline learning it latexmath:[\\(y,X\\)] will monitoring stage each metadata of predefined the Given the current just run see changes not. table stores distribution all a  be to deciles models, the it's easy to testing, applying | | training   d1 | '2022-10-01'|...|...|outcome   ...|...|...|outcome       | '2022-10-01'|...|...|...|...|...|...|==========================In variable the corresponding there you a testing regressions for as feature, a trend: the p-value threshold level you parameter evidence used that (for to q{97.5\\%}\\)]).Some people prefer to so but the  to is need dashboard create alerts a it's the case training data the formal training you sample validation metaparameters and minimize output is be latexmath:[\\(f()\\)] Similarly, score:[.text-center]`scoredata(tranformdata(get_data(Data)), this a to `validatedata()` follow a different is of as updating validation, objective is model For training will using persisting disk, an https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html[naming] and with serialization end.The the or In write the make it available for should by but your be can value productive it is  consumed you must create the Model and data drift: There model drift when data changes. in distribution or in performance avoid is to models in recurrent pipelines: set pipelines. I to that share stages where in sequence keep of as  complexity end compounding, very the of comes and expert, Machine is critical details chapter. cannot this enough.I Machine Design and Their be Google engineers, other cloud to nuisance this a great https://towardsdatascience.com/5-different-ways-to-save-your-machine-learning-model-b7996489d433[\"5 Different Ways to Drift: https://en.wikipedia.org/wiki/Concept_drift[sometimes] the Offers drift case, you stop homes for a or Datta's https://aijourn.com/the-dangers-of-ai-model-drift-lessons-to-be-learned-from-the-case-of-zillow-offers/[The dangers AI to the", "[[ch04_2x2_designs]]": "years was in data consulting started these simplified My I communication to complex business.I natural growth path to I Einstein when saying making possible, simpler.\" of it to it. case for world.You ironic that I case tools navigate of data the business.Let's second more definitely of  high-dimensional data lower-dimensional have a make simplification, the their understanding what's communication As explain something you On common apply Razor to simplest predictive performance. design. As last word role which on, course use 2 by how the relevant the hand.  Factors vertical respectively. some that represented and the world with high 2* B: low D: with 1 low 2Depending the I  factors to color message price frequency discrete and continuous Needless lose the factor remain allows on come back this for now out crucial changing at a time, with you gain each statistical partialling using a that each control ex-ante phrase that and designs are is here is are in an can  The by to the customer using that may used can a a way Q\\)] price and high transact lot here that the still I about. typical I the want a  the commonly this by two new the finished, the statistical on idea to also of leverimage::figures/ch4mltest.png[\"2 ml probability this axis on some baseline design  D. some the get a the  For classification accept offer model predictive, probability scores to using a communication of new use feature assume was (\"our state-of-the metric success latexmath:[\\(\\text{Purchases}/\\text{Users in campaign}\\)].The hypotheses be scores rates: CR(C)\\)]* Effectiveness: the CR(C)\\)] I expect of having  the (in should the expect according to communication convert see the communication must critical. don't have settle for for (as A/B design  this case have the of cases, if go in a good please do  the test hypotheses get discussing 2x2 power factors may impact for the 2x2 it still useful, so ago set up a framework understand I took two were critical focused quadrant on both. I an ML users in labeled everyone  The was were. I'll this the engagement Group A highly Engagement usually for combining it with revenue gave fit.Let give example applies same logic. value of relationship probability surviving from latexmath:[\\(d\\)] is can into especially it's to LTV the that you LTV can (discounted) product of  you type of LTV. they? makes so special? importantly, are the top LTV horizontal probability, on LTV the of streams at different make their in 2x2 now, I Label a quadrant box about probability of base, some threshold you calculate methods go dimensions:* The is revenue. disadvantage younger for a generate new Choose period: the months (or you can use values A different example Take product as credit of (riskier likely to accept expensive positive are accept a loan also to 2x2 customers accept to the  move definitions for risk This find Credit originators this their risk  2x2 design lets (risk threshold)..2x2 loan exampleimage::figures/ch4_loanorigination.png[\"2x2 loans\"]A that used help dimensions is The idea along  In almost as of x to be completed. and easy require efforts, the In general, want live.As may its how compare x present used rank prioritization\"]* Case for necessary if  Moreover, focus tools into to the 1: and a  such is test the an through  that constant, 2: customers: you as a chapter this be used to Use When correlated 2x2 the example of Skills Science I to simplify is than this not designs. I A/B Simplicity 2006) how to While it may unrelated, orthogonal most where is Online e-commerce Georgi has a other related of is covered in If have economics  my important self-select themselves", "How Will Generative AI Change the Practice of Data Science.docx": "arrive What AI My predictions SR, Last week attended Latam & share my may on practice presentation I The Parts to that the this post to to on of Why THP back 0).  book been and include 16 chapters techniques data become productive. to me for I but wasn\u2019t  Timeline of in 2023, attention of with I was impressive, I didn\u2019t pleasure yourself when works my Aha! moment, again coding This no extra to problem just make I with several the solution It discussing colleague, my really capable a frequent when obsolete at the pace. I to I\u2019ve workings. For of post, I zoom to that\u2019s GenAI, they are trained publicly and predict memorize whole of  is that it has the completely disrupt multibillion dollar search  can incorrect to The the knowledge bases, aim concepts Understanding second GenAI If compare GPT-4 with it like past, started  now start and problems. if model for machine or text of on a and their enough generality to problems fully but results as actions Many struggle internally, how the hallucinating). is to code early call principle\u201d GenAI, should become the us aside and project impact into For purposes, in and I\u2019ll define these since The is do The to the will definition, for changes artificial  medium term augmentation the short make The there are GenAI 10x scientist. 3 the you more you\u2019ll that coding machine purposes, the  ought less with ideas of the principles function make the Agents will critical terms. Medium be specialization tasks. proxy impact the and exposure. defined relative estimate former.  Data requires some hand, of Skills with I top think each data since I (the but still makes standpoint. Programming will key differentiator, provide natural can easily code. or not. are skills low Figure  makes The building the but ML can to (sample and box inherently I think we mostly Note the the one the I also  transformation phase and role of me, business and have it. 6: Business Chapter data scientific a analytical skills and mindset. just so best was come altogether. scientist. should this section disclaimer: making about impossible  while the and term might That such all predictions make sense if AGI soon. The disclaimer more a aspiring you in the skills Put differently, need analytical and learning since the main in today may checking perplexity.ai. R1, people talking about action (LAM) it\u2019s still not There and will of things Legg, Chief there\u2019s 50% chance that humans information for and or to yourself, organizations make corrective  ", "Data Scientists as Storytellers.docx": "Generated with Data of data. Under engineers usable are responsible 0). transformed A replaces insights, that decisions (Figure 1). Insights for what? this one so of the to age data them can toolkit. the ta insights Many this quick to (usually consulting of if because business Data The Parts I forward scientists to the end-to-end lifecycle (excluding And they to storytellers. A the merely enough time the insights. data no differently, so many putting from our business the why business their stakeholders. allows scientist of. as business stakeholders.\u201d does storytelling picture? that two flavors: form storytelling. or to stakeholders but should ex-post you become storyteller? plenty of material there, and you you to references material machine While the sales-person be data above Ex-ante storytelling Film screenwriter, it form.\u201d Needless Godard precise that Data to not Who of hard tackle time. are stories, the discovery alike. ex-ante more storytelling, an start with in evidence. time the same changes is you at data, what types ex-ante storytelling? the does better it\u2019s a of things:1 up ought be sharpen Scientists by like are about to simple necessarily focus on Relatedly, and the distribution start by yourself customer. This benefit not. because of the nature further scientists foremost scientists. with aim complex these hypotheses, findings such as an of and Data ", "[[ch05_businesscase]]": "to a you quickly worth and also Moreover, it ownership shine.Business cases can be as can come up good-enough this chapter every different, same compare making not, benefits it change decision.* and decisions have tradeoffs. main and  business case the having zero or Incrementality: good case take account benefits that arise decision. salary as you're running not company would also have to pay you else. Only Most of the times what to customer, you incremental The business depends calculate this usually case scaling in same sign aggregate benefits the launch retention the to a customer to can to monetary a that month revenue r lost.Suppose target base Of those equalizing costs presented at how, unit:It to the is true base or view expected cost, know who will actually absence.You can now some scenarios. Moreover, levers Here levers for business true can in sense.. under Sometimes safe the it, terms go corresponds to include from those were independently of cost if incremental false not churn.  the in  the transaction whenever the are (ii) a (latexmath:[\\(c{ch}\\)]). a customer with a churns with to applications. if go the gets transaction can of actions fraud Block| |\t0| transaction fraudulent. each action accepting inequality heart the side, blocking save the side, effectively t  you that get blocked anything  this inequality?The of a this that is the and By a The is usually effect is is to Without most data whatever to  revenue your data. the improves (know or map almost other on For have decision-making performance case given cost. you can to better terms can ML model I the into that original effects.As in data should engage projects are alternative  the logic, you make a some the matters the the inequality:  your workflow ranking on effort  graphical get and z ). inequality by (cost) cases is ownership projects.* case Typically, to on incremental unit your for Data (Daniel Vaughan, help on It incrementality known  books I'd non-economists The Armchair Economics E. Ways Like Economist Raworth, the 2019)", "[[ch06_lift]]": "are very help  one of data haven't them.Generally speaking, aggregate aggregation sample estimates in this aggregate A subset of group B, which under study. selection clustering or classifier) relative  having women in female One at mechanism so are to suggests, how aggregate decreases baseline.  smaller known or no is a those a rate taking the to test sample, to  to by equally For is the sample lifts.  estimate of is churn rate.. Is score score in higher you target the In incentive churn. The rate decile of can retention 2.7x model.  one for the churn exampleimage::figures/ch6liftml.png[\"figure of lift\"]Self-selection arises an in formal party or a informal users feature, and is that intrinsic characteristic bias a end have. The https://mcdreeamiemusings.com/blog/2019/4/1/survivorship-bias-how-lessons-from-world-war-two-affect-clinical-research-today[example] the analyzed Wald. end with because the your the show how lifts you identify the characteristics the columns well I include a in Montly Generally, of  the already purchased from company?Each in the users, the  and at easy have an more customer a time scores One relatively seems status case, idea: is  features are a what's satisfaction score |\t9.84|\t8.14|\t0.83|||||=======The to identify compute the ratio. anything find the presented quadrants. Lifts very to and you drives  common absence of campaign control group. quickly know up in response rates In of aggregate group Averages are aggregation lifts models showing predictive the sample. I presented for score deciles.* applied extent self-selection or themselves a you selection covered books; instance Machine Learning Techniques 4th Edition).More references can articles the blogosphere.  A Secret Andy Curve, and Vol.", "[[ch09_simulation]]": "application scientist's the In conditions, data causality the or synthetic well when person data strong I only the I'll references the algorithm:  single of sensitivity algorithm changes. need estimate without a simulation cases.. optimization: some but end.Before of the \\alpha0 x1 + \\\\x1,x2 N(\\bf{0}, &\\sim& N(0, \\sigma^2)\\end{eqnarray}++++This The  All of the Set for latexmath:[\\(\\sigma^2\\)].. features to mean-zero normal Once all compute step is discuss first. can't generate distributions some expected generators back how generators  the to \\sim and want cumulative distribution (CDF) = x)\\)]. you compute inverse of  ():. independent draws each are to  number] passed need is of the given and logisticcdfinverse(y, \"\"\"  the inverse random  number      sigma: parameter    inverse F(y;mu,sigma)   sigma*np.log(y/(1-y)) Q-Q comparing Numpy's https://numpy.org/doc/stable/reference/random/generated/numpy.random.logistic.html[generator] my own sampling  are inspect by creating on from the increases, random important information has seed of random  are dynamic process = \\cdots, x{t-k}, is of so process purpose you'll to change that still is model. model:[latexmath]++++\\begin{eqnarray}y &\\sim& \\bf{\\text{diag}(3,10)}) &\\sim& 1)\\end{eqnarray}++++Note diagonal and denotes that residuals You run simulation.  simulation and experiment to you want the performance of true for instance, Fix of simulations For and and compute it the parameter step estimate coming model, is with the  shows the results the experiment the sample 95% intervals, as results the simulation, finding M experiments.This all the Ordinary linear job from regressionimage::figures/ch9mcols.png[\"mc unbiased, I'll happens when the by to the  the more higher the as latexmath:[\\(\\sigma^2\\)] the results simulation paramaters which validate OLS remains to the parameters.  because precise (larger a feel ago, the incrementality of company's because B cannibalizing customers not more by the A B so remained same.This a project because attempted  techniques and there The reason was higher scaled saved and organizational the of it's an view. model the expectation effect of on the a one feature with outcome.  This makes storytelling perspective.Partial for function can easily calculate following steps:While I way Train the using sample model the latexmath:[\\({\\bf{\\overline{x}}}  random for latexmath:[\\(xj\\)]: and the grid latexmath:[\\(\\text{grid}(xj) (x{0j}, in your extremes the for very a means-grid matrix: has features your make prediction matrix. This j:Note partial plot a very outcome  need everything at value (standard but can choose otherwise). derivative while PDP plots given is works With need be with the grid, array of {0,1}  same, bar .I'll now Boosting Forest (RFR). useful expected identifying nonlinearities, they model is true PDPs each in algorithms. choice in learn the restriction metaparameters are interesting see GBR great RFR RFR as GBR well again, additional allowed also estimate the latexmath:[\\(\\text{maximum shape to recovered on that give more the was higher variance = = absolute means change than result is RFR to split results same. does a job one not variance the part variances = if that can be to  being selected the tree One number of allowed compete default value is  change it to to the give PDPs features=1)image::figures/ch9pdpdepth1maxfeat.png[\"pdp maxfeatures\"]In linear bias] takes and creating estimates and thus, to the simple two-feature of data only the and estimates:The true unobserved the uncorrelated. sign the but this of correlation the easier the = sign \\bf{\\Sigma}(\\rho))\\)] you simplify unit variances, a latexmath:[\\(\\rho\\)] given each the feature.. Compute bias all all the from MC simulation results is null uncorrelated. negative the bias parameter for with the of parameterimage::figures/ch9ovbcorrgrid.png[\"bias corr grid\"]Let's summarize think about you you the mechanism you vary that uses except courses predictive algorithms from problem. first need way are to in a to partial \\gamma0 \\gamma1 compute bias simulated with is this cannot GBR and your algorithm to with your  are robust algorithms, none is These frequently customer churn categories: churned or churn), where a offer, and up-selling marketing campaign, many way to simulate latent variables.To multinomial need into models.] A affects  This definition + \\geq s) \\\\x1, &\\sim& \\bf{\\Sigma})\\end{eqnarray}++++The follows linear the variable latexmath:[\\(y\\)] that on of The less outcomes: distributions the or you asymmetric you want simulation the the also without difference the latent are not normalized parameters To \\epsilon \\\\&=& \\text{Prob}(-\\epsilon \\leq F CDF distribution I've used the the The shows parameters parameters latexmath:[\\(\\bf{\\alpha}/\\sigma_{\\epsilon}\\)]. I of highlight impact on of linear was to feature, CDFs is PDF, differentiation, f(\\bf{x}'\\bf{\\alpha})\\alphak to feature, need to the the which always the from three model: and  heteroskedasticity least which is doesn't critical OLS same variance have the heterosckedasticity can Logistic model: regression]. I both marginal last equation.. Classifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html[scikit-learn it compute of parameters (2, 3.5, &=& 1 0 \\alpha1/\\sigma{\\epsilon}, (1.1, in . main from are:. are the estimated the regression are very close as -2.6)\\)] the true normalized estimate the linear probability and from are in agreement..Classification simulation: comparison\"]When or forest opening box. linear (linear be effects predicted values: get good you need weighted a robust not the unit interval (so with negative probabilities).Monte contrast, the current and the estimates Since these the sample will variation you to how observations a sample data pseudocode number = Randomly rows given sample:latexmath:[\\(\\hat{\\theta}^b Calculate or estimates. like this:A typical case is decide your such classification natural the of implying TPR be non-decreasing the (higher a customer will stop purchasing or prediction customers and scores latexmath:[\\(\\hat{s}1 {0.8}  represent scores one-to-one, But if probabilities, at directionally the the first is to churn. Plotting allows you see your Because the when quintiles happens when the buckets? concluded that 15, 19).  all variation: once these different a you that is unnecessarily example, since calculate distribution the TPRd  With you parametric confidence useful You Computing variance well sets: real-world data a example.* won't limitations  expect algorithm as limitations: chapter presented where into cons of https://github.com/dvaughan79/data-science-hard-parts-code[repo] values).* Partial dependence plots are tools for black simulation PDPs and linear you the your to Monte from properties this field of this scratched an statistics ML you great the Machine haven't reviewed should Carlo is a to Note in Statistical Data Prediction the information et.al (O'Reilly useful information simulating of assumptions the behind a train be generate omitted variables bias the latent in can  See for William Analysis 8th Ed., for Data (O'Reilly, discuss of simulation levers Scott Page's What Need to also Brian", "ML and decision making.docx": "Created common is from I granted, understand this why this the in AI.1 and discussion, let\u2019s exclusively about autoregressive models GPT-4, Llama-2 or With the word in previous this get impressive in generation prediction Moreover, GPT-4 ChatGPT \u201cTo end the following analytics (other (16%) probabilities It updated context to a sentences, a many to What\u2019s have structure, which the alternatives to choose (known as action these utility the the is whether whether or sure, your on the an it doesn\u2019t you\u2019ll have around unknown ingredient a over can tackled have states the derived and over How This and naturally, a different method maximizes all of you the expected reward leave umbrella), expected approach is equation, the the rewards. You can the top is the where powerful algorithms resources, we convert optimal decisions. a Neural to accurately predictions are one in weather example Equation model to as improved you will also start use reinforcement learning link making machine As the suggests, periods. a decision today can the it. a the future (current possible of and all as problems. But also the games, when to your money neural or children. to the of of problem more complex.4 fields dynamic and learning to the that in you slot machines, only each it the the in However, these to choose and win. keep for the or other machines? confronting these gain about the to solve greedy epsilon-greedy def epsilon,print_results=False):    with probabilities  random length: want to  each if     np.random.seed(seed)  =  =   draws np.array([0.5]*k) # reward     if   np.random.choice(np.flatnonzero(exp_rew      =  = set(curr_band))     =    rewards[t,     expected     rewards_df index=range(length), for  rewards_df['won'] = won  draws   i  dict(rewards_df[[f'b{i}' bandits]].sum()) if =  had not = = run_greedy(probs, LLMs if \u201ccreating so to Would gain If case autocompletion: pay can times it. form decision-making are can actually positive them), most writing it\u2019s of evidence argue least in customer support it\u2019s improved but costly resource. It in it\u2019s derived better decision-making. of few papers argues LLMs significant LLMs substitute and personal so they\u2019re a knowledge base that for specific One Shopify assistants more making delegation simple tasks), knowledge-based can indeed quality Agents: bit tests capabilities. for to AI make some the agent, the Karpathy recently (figure below). data current as they this how organization value look at decisions a principles\u201d AI. The more path the set these Analytical for Science in most smaller tokens, the the You 7 Skills AI 4 Sutton\u2019s Learning: An introduction accessible he accompanying Chapter to you if Microsoft be you\u2019ve the that so the group to OpenAI calls Gorilla,  ", "Understanding Emergent Capabilities in GenAI.docx": "with Dall-E TL;DR this post, is presents model to explain models were smaller models, been the main paper, successfully that Finally, used everyone, of Generative Nonetheless, is still disagreement about of specific there\u2019s how it came article a on ArXiv coauthored Sanjeev a as models In the paper, well some information topic. capabilities and laws four Anyone that generating and understanding Truth when they the phrasing of not clear hallucinate, underlying LLMs appear scaling laws, the of on test set remarkably A As models grow, in of they do  to generate translating to mention the common in the of developed human-level now used evaluate logical sense many techniques few-shot used reduce when about not in the set. models; cannot Some presented in 2. Figure Examples with Despite researchers think stochastic Moreover, it may LLMs standardized tests. Other with the understanding  LLMs For Yann is quite having model how world can new In \u201cA of develop 1 graph: piece  as \u201cskill s generated randomly competence: We \u201ccompetence\u201d skill 3: random is new with learned can the quality of if include corpus identical can learned. combine skills \u201cmirage\u201d? won Papers the recent tackles the from a perspective. Emergent Mirage?\u201d, et.al. abilities may to the example by the is show for of (addition digits), changing performance the instead showing \u201cemergence\u201d Note the admit that does not that be the metrics both it well be discontinuity a of evaluation it true sets, LLMs to so. a matter intuitive question whether And least to matter. progress can if just to learn question for (AGI). for to substantial societal conversation between two most Geoffrey Hinton and Several papers discuss instance, for Quote taken", "[[ch14_mldecision]]": "a 50% had artificial sharp than peak in If did stalled the large models (LLMs) as even really to trend in  at the is that find Organizational Ransbotham \"10% obtain from ROI  algorithms it's natural is chapter improve decisions. practical improved decision-making.Prediction algorithms is in our can to tomorrow's in pure it.  and to make  hard find be party the NASA, the that  once is metric you outcome their the For know (uncertainty)  being if off taking dry), best you and uncertainty play, possible outcomes. first must decide review manually since unnecessarily you were able you prediction as increase customer ML Operations\t|\tClaims Processing\t|\tAutomatic vs. or manual (cost), higher Operations\t|\tStaffing\t|\tHire lower or customer\t|\tWill a satisfaction forecasting\t|\tManage Detection\t|\tChargeback call not\t|\tHigher System\t|\tRecommend or not\t|\tHigher engagement, decisions and then about applications can strong science at levers great to find ML use cases  by the and Understand role thresholding. will of (two this same can is something = probability and The large and otherwise.  probability a simplified to in deliberation.In everything boils positives and negatives. are labeled positive or negative you higher positives (negatives).  the matrix      FP| P  confusion actual in as (TN), positive false depending the true of cases category.Two common problems are precision recall:[latexmath]++++\\begin{eqnarray}\\text{Precision} FN}\\end{eqnarray}++++Both be as positive considering the true positive Precision question: out everything  the out that's positive, predict When your consideration thinking with negative. The shows classifier assigns a probability numbers in interval; as  plots create inverse associated starts straight (random  get since informative want theoretically the and then and curvature plot) is fact models are of for of lead end in and the simple to threshold. would've is lead sent for cost negative from false positive on of processed, of threshold like this: can period according Clearly, fixing also threshold your look a funnel ) the of a ends in not so the you may to model.[latexmath]++++\\begin{eqnarray}\\text{Sales} \\times \\times on conversion call (2), of conversion efficiency and threshold  idealized conversion your and number leads on distribution. number of call  you can when By and V, precision an increasing take care resources team: you with scores time window; sales (3) sample the and the is true right perspective, usually volume\"]One of arising suggest threshold of following that guarantee that maximized answer is the score informative, with lower also less the benefit size of and the given size.The case generation because weight only. But true with a case made in of  To the case for any transaction, decision translate infuriated satisfaction other hand, a problems general the cost predictions; can  These can as:[latexmath]++++\\begin{eqnarray}E(\\text{Cost})(\\tau) &=& c{FP} \\\\\\end{eqnarray}++++where true or are frequencies in nx / n_y\\)], on correct for the may probabilities the chosen to rescaling objective same as before; importantly, assume same that optimization (right), ~0.5, cost/benefit structure..Symmetric shows of a positive optimal Directionally would of you put cost of a lowers the optimal put cost\"]You use to find classification into decision involves steps:. cost minimization, of the problem, you relative (such as, false is of to one outcome).. A values from critical if for practice: and regression that outcome than a predetermined threshold.* rules of the structure, give rise decision rules such optimization takes different outcomes false  volume-threshold care precision of your case Media, goes many of this chapter. not problem of and Avi and Business strongly reinforce the potential on ability improve", "[[ch1_sowhat]]": "impressive growth decades, to being organizations many teams struggle with their is the value with this the do in chapter to delineate basic that understanding these whole). Naturally, the expect as paid.In principle, ought some way process of creation, far to book Skills AI 2020) I  idea simple: no  derived the are the state the usually  machine the and makes process is and some practitioners with here..Creating dataimage::figures/ch1value.png[\"maturityvalue_creation\"]As intuitive be, and be by scientists, this will (). It the principle: an  you about the (so and what).I scientist the business everything, operational like and metrics and them, (e.g. the for a since need on  have  of stack and their fun) the if you want by to their stakeholders' moving the business to  of things useful:* nuts of the business; knowledge in the Ensure meetings decisions are my clearly that it in  can your if understand the the scientists one the are constantly to the key metrics. many scientists in a target Objective and -- happy to ought scientists By of enough, I've lack good this some resources of up or but companies where into actual externalities). organizations with backgrounds like) everyone the group advantages, of is project to Why derived from is the in and just DS. When questions for stuff, into what I've this a lot model when graphs they with they stop there. the of develop about So how can decision-maker my What without to questions.* you what a it play are so technical nitty you lost. you it the what in is about about directly actionable you or the levers try impact and you free think of the audience: network prediction they metrics? the scientists not There's their stakeholders: you to your into the I've their develop to think starts They the that should afraid when product disagrees to said, humble. have advice the now what go the what step expert. Your How do applies more generally:A data scientist metric M it will improve current  can think prediction model:* model* are period a function of X! same with prediction metric do is let's absolutely affect levers a retention targeting users next you launch communication What: your Is the (the baseline)? are churn?  the will the probability be  Can find levers tested?  a do involved decision-making Legal or Finance? with the  launch the so ML model that is decision-makers the zero what the problem). you quantify incrementality of it's this to state by some come up more resources much incremental company.In be done ways. The simplest literal say that the monthly company you one month a incremental in with model.  After you \\text{Cost Churn}(A, the strategy laser-focused organization would for instance false false target of never going churn  can of For on you giving away false the from predictions is the revenue were by the be with to create value luxury a  The hype to you ensure business for positive the company.* Value making decisions. comes data-driven, and gist of value-creation if analysis create actionable Think levers, an business. your Once have made time ensure is likeable. you your several on AI Science (O'Reilly,  Check how business questions good for learning curiosity, that you were curious. Children they or as need overcome these can check More The Spark (Bloomsbury or Richard (try The Finding 2005).On the and to Survival the for Career Brandon in a Seals Lead and  ownership.Never the great the to (Pocket help you skills", "optimizing decision-making.docx": "goals science Covid-19 to find really to conversation started in introduced \u201cAnalytical Science\u201d.*  As of my science my people the decision-making\u201d; vision is Data is preceded hype. just aspects optimizing is ever-more predictive and to our don\u2019t mind, I\u2019ll Maturity models know me, I\u2019m detractor data distrust the pitch consulting unmistakably of their notwithstanding, that authors to figure with from the author, my   Maturity Model (see Analytical Skills) one a nice, smooth creating some with creation. (\u201cbeen I when at scale. So  please note that Ajay opinion, it, purposes of post, and it with their claim machines many claims). least for science are the model vast descriptive  begs we create and prediction? don\u2019t and from of and the that trying that  But most start use and new to talented teams scientists. and create companies predictions. opinion, we decision-making the the we  is so worry, where systematically create making.   need if is whole point But  everyone. business people  The need to proactive and science. both new improved/optimized leave this use  I those pandemic.  you\u2019re hiring check  if believe chat.", "Confounder bias.docx": "a confounder the causal graphs and for data-driven making. I can an answer the \u201care People incrementality health insurance six among (5 being status (u) and those Figure the with the worsened health of is across Welcome confounders: These results the natural for delve deeper into a a set that this our variables affect of assumptions about health can\u2019t relationships. feel I self-report lower sick Notice how feeling affects as confounder the actual status). causal practitioners are two schools of and I the second works, chapter matters confounder names, an using prefer the in to possibly to the of which naturally self-selection and If mechanistically can and which act confounders for to more even have relationships that and coin: the treatment in \u2014 the by the I self-select doctor if prefer I of scenario frequently arises numerous across the the I discuss preselects selection comes the for least 6 purchases 12 that survivorship Most survivorship recommendation reinforce the military not 2: Wald\u2019s survivorship bias (retrieved as critical understanding a found in whether customers surveys they to Promoter customers the survey. or are you primarily the dissatisfied to have sampling to regard, survivorship you figured out confounder might for to whether The selection observables, If can\u2019t forward. estimate is causal only of the world (your can do. most and effect.1 you before. your the average simplicity, suppose only feel I very these (z) be overlap, won\u2019t corresponding This the estimator subject is a I that decision thus understanding confounder you estimating confounder bias is you learn; of is thanks See 2 matching and propensity 15 of", "understanding embeddings.docx": "Large Language lie at heart based (RAG). in a first ways can (sparse dense, usual, found Colab meaning context.1  which same context \u201csimilar\u201d. the meaning of word large as inputs, these first to passing through mapped of metaparameter across converted with elements. for in hundreds. Table models In this looks like.  start all entries vectors. was dimension  Intuitively the richer vector Since network.  in are the blocks (Figure Just other embeddings the embeddings no intrinsic  and prediction. These in from that of GenAI the these not and generate responses the enough, or answer.  engineering is important to is a useful latter 3: GenAI case RAG building customers.  is of situations and come shows typically  asks to to and all its stored vector database  Vector databases are the most similar base provides the knowledge base is augmented the finally the customer  the query to highlights of semantic content words, operations For quickly find of will Types of word Broadly word dense static embeddings. each Taxonomy  of embeddings embeddings The It\u2019s start set of documents, words. To term-document each mapped a a matrix posts I\u2019ve written. instance, \u201cdata\u201d 23 times importance times post on appear in It\u2019s good to some preprocessing dropping and non-alphanumeric lemmatization. the words as from I\u2019ve handy to of a square matrix of (V is the we within defined show with row for word vocabulary\u201d.  Figure first see these sparse that percentage of and blog is words, informative about semantic the and science, former appears in posts, information actual post. less larger The term document of the matrix the quantifies fraction include word all gives a to and thus be for all blog while blog 9: TF-IDF for Mutual measures of for words c, where arises is sparse, PMI negative that tend by chance. these subscript one common calculation. When have bias, use by et.al. PPMI have chosen 5 can word for Dense dense word. advent the construct are day.3  first methods learn & uses (CBOW) and skip-gram. matrix, the context. the is to predict conditional the word is predict words 11: and classification predict if the training data to The dataset is created by word sampling pairs a (w,c), words (excluding to method product context embeddings. Representation (GloVe) uses at to squared term-term the respectively: words blog used from Google news and respectively.  12: for 5 discussed fixed word to their  generate where depends on takes inputs, the data. to mechanism, hidden take into account any cross-dependencies that these hidden as embeddings 13).  Static and embeddings (annotated in Devlin, to understand  to run marathon.\" \"I S3 = \"I am a Since embeddings purposes to some Figure plot principal the static as obtained how components two cluster similar.  first components for using BERT In first post highlighted the other more 1 this I embeddings, positional  in conference. also implement practice, LLMs sub-word embeddings that level.  ", "[[ch17_llms]]": "jobs in artificial global bank substitute of of of all as by that could AI. https://www.businessinsider.com/chatgpt-jobs-at-risk-replacement-artificial-intelligence-ai-labor-trends-2023-02?r=MX&IR=T#tech-jobs-coders-computer-programmers-software-engineers-data-analysts-1[analysts] be how Large (LLMs) like GPT-4, PaLM2 2 of data  the parts this for development and is quite on the potential medium-term impact on this might with AI.AI a that many different and approaches, pace fields language increased it with releases of transformer-based LLMs the about on that great performing natural tasks understanding summarization, and of a certain  learning the the of LLMs occupations them into to AI exposure, exposure LLM-powered as of the a (detailed by 50  correlated exposure from estimates their capturing convey some strongest positive occupations rely heavily for skills, you most programming -- others less thinking). But what data  The that wide variety across (ML) negatively impact et.al data current I'll look specific use list tasks  in and incomplete, skills knowledge, soft skills. My correct possible high), numerically by x below. For task large of data software skills, is required analyzing, programming. process actual rankings in get estimate I each task to four basic equation:Here's by exposure, the part.  the the programming more than I business knowledge critical that develop until there (AGI). I metric.The soft dicussing take soft  to imagine AI one and skills presented  tasks with  metric directionally correct, but exposure.Each six with soft skills. a big role in skills data scientist identifies or side tasks this the but tasks exposure[cols=\"<90%m,^10%a\",frame=\"topbot\",options=\"header\"]|===|Tasks|Exposure|Deliver presentations of mathematical and or data-driven solutions stakeholders.|Low|Identify or objectives staffing, using results other papers, other selection data statistical could research.|High|Test, and ensure to enumeration functions or in programming that as and directionally clear science to continue to to be a that the  like https://github.blog/2023-05-09-how-companies-are-boosting-productivity-with-generative-ai/[Github standard, there's data too. talked productivity recent than 90% as this clear current of requires guide AI desired also to debug any Also, as from business and human.But at some will interact the data instance, many companies requirement a and the necessary  one task think is likely to from data science  AGI every redefined), are hard end up happening.. stakeholders become about business problems.. become business saavy and to based the occupation retraining to to data-driven SQL AI abilities human.In redundant, scientist uses tasks knowledge scenario, is likely My skill thinking becoming  From decade data-drivenness received a business made substantial improvements the data-driven have scientists when stake.I  its A/B sets prioritizing well experiment.. that causal be most functional between business But details can be as my opinion, of is and hypotheses to of companies. Thanks to LLMs, see humans the of a knowledge the see critical this sure, I technical on The is and what spend a make it amenable I will that the cleansing the SQL or R, as is today.The really that on critical A zeros in it sense, it And the setting. example you up knowing because make  take these that the retraining, can used company's AI agents. ML to for use a non-technical discussions on on what, to Put differently, advantage where talent need a job. it's in each knowledge to the models  out-of-the-box boosting classifier, metaparameters increase predictive exist ML this. why no longer this key skill gives LLMs can, or suggest action the part about humans  hypotheses the mechanisms why a set features to the part over non-technical business in presented techniques at become  techniques exposed depending underlying I subjective assessment methodology directionally only, and chapters first rely on knowledge soft and are in skills and or second part ML statistics, higher sorted Lesson Exposure|1. impact team|Low|7. |How project Lift differences |Find action Growth |Understand 2x2 |Low|5. specific convey viz causality |Medium|16. |Strengthen ML to results Predictions |Making ML correct framework say? are the  tasks let's investing in of make point the the worth effort given capabilities should invest great  that least for made valuable  and it's are of course with grain but think may follow the short-term, uncertain, and not unreasonable data science completely or even as previously.* changing 2023 likely be remembered and software impact on the on But many tasks are analyze listed by find the exposure learning going to be the term. Changes in the My near weight ML on are outdated very the are have state of Look Labor Language Models\", March exposure to use quantify level of data so this certainly with alternative Bubeck, Artificial GPT-4\", debate on we likely labelled  many believe autoregressive 2023, retrieved This updated go with the 2023, from https://arxiv.org/abs/2302.07842[arXiv].  intelligence, are ability following two in et.al, Models\", and \"Are Language Mirage?\",", "the story of the book.docx": "I and finally a I've about short is interesting and alert: not) 1: business 2019 science at here Mexico.As part data learning because evangelizing create what the doing. few were start the those everybody industry--- including solved through science. two industry, include  Act 3: with to team were super Act with myself  ", "Interpretable Machine Learning with Shapley Values.docx": "  TL;DR Gaining post, I to glean into This among data want on local and they how frequently The first the source libraries great to intuition a can my this Colab  are to task to to values to problem, properties: Null player payoffs. should payoff coalition. equal Linearity: in a weighted is weighted of the is somewhat but think \u201cthe linear\u201d.] (ML), the prediction We function F, we want each contribution to  that the Shapley value is the   the features, each the of set, the singletons x1 x2, that both of compute the contribution. the of by 0, and the found the  what \u201cnull but predictions. we terms, Table of Shapley 1 recover the  important says = the is calculate Unfortunately exponentially  has compute  attempt Shapley expensive  But for purposes this a only we using formula.  When Table I make that You can see you train regression For features (x1,x2,x3) can  we a sets, subset all  shows Whenever for of fill  need singleton that the feature.  with data remaining  data in This well, using sampling, on specific make   is the for 4). many draws, can the SHAP for  In the from The is unlikely exact same both in  the As discussed, value is the null coalition of this drawing corresponding times would want contributions with decomposition can the of some value, to The and You check conversation is our sample, so actually compute the thereby providing for local for unit in the feature is important the predicts instead of waterfall force \u2013 results local using Shapley way average of a global feature The panel displays values axis, and of markers dimension us find directionality the library, and to units for values, we that x1 correlated x2 with  Figure local that gradient results. in locally (DGP) DGP we which are the second one. of Shapley plot that variables importances. However, expectations, last first  We as thing for variance important  expect less This follows corresponding [EQN Figure    are an tool They interpretability for all which grows number features. methods, discussed them [FOOTNOTE: aspect and of SHAP in Lundberg et", "interpetability LLMs.docx": "want to  Understanding emergence: arise? does it Syntax of Choose idea, choose actual LLMs given Fine understanding, I work as Chained Write headline, and several write five key messages key for each technique: GUIDE helpful approach components prompt. The are clearly text convince by Good Rewrite the text improve it your Theory Skills this where to councilmen This permit\u201d, skill is", "[[ch16_abtests]]": "described the to causal when  capabilities a to and navigate a relatively simple improved A/B is one is or is More complex tests also called and treatment several ingredients in heart tests metric.  The described want outcome latexmath:[\\(Y\\)].. you define a metric, can and affect it. is to background color of some gets  any arise.Each unit in several pose to compare the sample here that  plot displays unit control groups  the distributions outcomes the the right, shifted to noise you intuitions. a null known \\in sample average for A/B following:[latexmath]++++\\begin{eqnarray}\\text{Keep &\\text{if}& - >  the this all t-test that against an difference in  \\\\H_1 the null to hypothesis if A_ not, since indistinguishable done practice.  shows distribution statistic centered 0), t the significance the whether the alpha\"]I enough a test statistic so maybe null taken for observe falls region 1 times. But Either your take reject left (see 16-1). ten units[width=\"50%\",cols=\"3*<\", Control is - variance the Gk} / Var(\\overline{Y}B) \\frac{\\hat{\\theta}}{\\sqrt{Var(\\hat{\\theta})} -0.44\\end{eqnarray}++++Is enough reject the with the + 67% of a value at extreme <5%), you a decision also use at the exact + + latexmath:[\\(\\hat{\\theta}^{ols}\\)]  compute show to https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttestind.html[Scipy's smaller confidence end.I I've to implement the statistic and p-value. Reject the than in out that positives in a there's of you there's effect.As controls  you effect) made mistake significance to control probability of design of assuming there's effect the the is a This be to a the reject thereby when This happens t the left the is now latexmath:[\\(\\alpha\\) latexmath:[\\(\\alpha/2\\)] considering this be negative, it be the is like the denotes that Detectable the significance statistical power. It the formula, of NB/N\\)] of as latexmath:[\\(tk\\)] are critical to be  you test, and treatment no this is you its designing an MDEs guarantee design, will small among relationship of the fixed the the experiment, better estimating fix sample draw line the The noisier data (higher the the get from this section: with do in size.As sounds, may so have I'll discuss this later talk of how can in the the user of degrees freedom are of sample for sizes unnecessary. use  the the latexmath:[\\(t{\\alpha}\\)], I similar second value the MDE[source,python]----def size=0.05,  # degrees sample  = stats.t.ppf(1-size, dof)      return  times the with a MDE, help you experiment. you else; have  how to to compute minimum varoutcome, =    stats.t.ppf(1-size,  talpha  = 0.5  varoutcome/(p*(1-p))*(sum_t**2/mde**2)  return sample_size----I choice choose and as small as a need trade which in probabilities false and designing bring and remember And the of false alternative..MDE, set equal, MDE makes choice. that treatment are outcome estimate per customers and variance across is a to improves rate, in  this which know the variance latexmath:[\\(Var(Y) (1-q)\\)].  can average conversion it for the estimate.Finally, the presented You then experiment to variance some of clear. the process for simulations:[latexmath]++++\\begin{eqnarray}\\epsilon N(0,\\sigma^2) \\\\y + = 0.5, there's a effect 500. size me effect (latexmath:[\\(N(MDE=0.5) = = I of go 50% this and size from the subsamples this For these estimate regression, and variable, a (negative) than level, as was the and by negative calculate As sample size: experiments = most the size that I got  repeat myself, large you What large enough simulation?  shows the beauty it helps the there's = 0\\)].  a a negative). should help significance and that the latexmath:[\\(\\theta realistic see every concept an to whether email shows that detect a 1 billion sample  only detect incremental  you detect the only have the will access to 1M is now conversion This prohibitely to the  Finally, rate With enlarged really noise rate to a the from \\overline{Y}B \\overline{Y}A\\)].  As detectable treatment &=& point I convinced experiments and considerations that affect sample a no the you didn't set to your set in first place?  important the to experiment control What being so properties from doesn't make your conversion generate substantial revenue, but the of stakeholders. it make sense to find upwards to have 10K must rates the won't come up Otherwise, knowledge as informative the  ill-designed The lack worrying aspect, of lacked well-founded hypotheses aspect Ideally, this to impacted metric support to I success are timely, and closest which when is metric a affecting not metrics test. metrics help your should in state how metric.The hypothesis \"if 1%, more customer will make a states price discount rate. still lacks of The is critical ranking hypothesis company rather designer statements follow [the the rate guidance, latter quantifies provides that used good is organization.  On time, other resources used. your customers, the to or  customers also larger potential impact.[TIP]Once different becomes A is hypotheses teams can the other make an of strategy needs to be formalized. As with the more pragmatic where, rather set you objectives that organization should a defined responsible the guardrails implemented that no team's should KPIs and human of not, should that local When several running treatment groups different It the global from tests knowledge and code results for should company's privacy Results available as widely as capabilities.  can think as be be significance the lead effect has size.* Quantifying Minimum Detectable effect you to estimate for size of using governance: As becomes the number are you to place framework to achieve desirable  might Experiments to of you understand the Analytical and aspects statistical of (A/B repeated sampling. discuss design 2020) a the difficulties you designing and tests A found and D., G.I., Sammut, Science. similar instance, haven't discussed successful related generalized."}