PDF superior A Bayesian semiparametric partially PH model for clustered time to event data

A Bayesian semiparametric partially PH model for clustered time to event data

A Bayesian semiparametric partially PH model for clustered time to event data

and 2(b), it is evident that the gamma and σ-stable CRM based models are characterized by late and early intra–cluster dependence, respectively. More specifically, contour lines indicating positive correlation appear earlier for the σ-stable CRMs, corresponding to higher values of the survival functions. For example, compare the contour lines for the value Σ = 1.05 in Figure 2. On the other hand, the rate of increase of the survival ratio is slower in the σ-stable case. Thus, high values of the survival ratio appear earlier when gamma CRMs are considered. As a matter of fact, in the σ-stable case, two units of the same cluster tend to be relatively weakly correlated in the long term. Those patterns of failures are often observed in familial associations of onset ages for diseases with low penetrance (Fine et al., 2003). Therefore, the parameter σ can be thought of as a dependence parameter. As suggested by our numerical study, if σ → 1, then τ → 0 and Σ → 1, capturing both local and global independence between survival times. On the other hand, a value σ < 1 reflects positive correlation between observations within and between clusters. The interpretation of σ as a parameter capturing the dependence in a cluster is supported also by the marginalization properties of the proposed σ-stable model, since in the marginal model the parameter σ affects multiplicatively the regression coefficients β. Hence, the stronger is the association between survival times in the same cluster (the smaller the value of σ), the weaker should be the effect of the individual covariates of the subjects in the cluster. Once again, the previous discussion shows how our modelling framework preserves and extends well-known results for the shared frailty PH models with gamma and positive stable distributions (Duchateau and Janssen, 2008).
Mostrar más

31 Lee mas

Objective bayesian variable selection for censored data

Objective bayesian variable selection for censored data

In statistical data analysis it is common to consider the regression set up in which a given response variable depends on some factors and/or covariates. The model selection problem mainly consists in choosing the covariates which better explain the dependent variable in a precise and hopefully fast manner. This process usually has several steps: the first one is to collect considerations from an expert about the set of covariates, then the statistician derives a prior on model parameters and constructs a tool to solve the model selection problem. We consider the model selection problem in survival analy- sis when the response variable is the time to event. Different terminal events can be considered, depending on the purposes of the analysis: deaths, failures in mechanical systems, divorces, discharges from hospital and so on. Survival studies include clinical trials, cohort studies (prospective and retrospective), etc.
Mostrar más

165 Lee mas

A semiparametric Bayesian joint model for multiple mixed type outcomes: an application to acute myocardial infarction

A semiparametric Bayesian joint model for multiple mixed type outcomes: an application to acute myocardial infarction

We propose a Bayesian nonparametric hierarchical model that includes a cluster analysis, aimed at identifying profiles or hospital behaviors that may affect the out- come at patient level. In particular, we introduce a multivariate multiple regression model, where the response has three mixed-type components. The components are, respectively: (1) the door to balloon time (DB), i.e. the time between the admission to the hospital and the PTCA; (2) the in-hospital survival; and (3) the survival after 60 days from admission. The first response (continuous) is essential in quantifying the efficiency of health providers, since it plays a key role in the success of the ther- apy; the second is the basic treatment success indicator, while the third concerns a 60-days period, during which the treatment effectiveness, in terms of survival and quality of life, can be truly evaluated. Note that the last two responses are binary, so that, as a whole, the multivariate response is of mixed type. It is worth noting that the information on patients’ survival after 60 days is obtained from the linkage between STEMI archive and a further administrative database concerning patient-specific vital statistics such as date of birth and death for general causes. The linkage between the different data sources was carried out by Lombardia Informatica S.p.A, the agency managing regional datawarehouses. We do not have direct access to the data sources so as to construct different outcomes of potential interest. Moreover we work with a singly imputed data set and we could not identify data preprocessing tools used by the Lombardia Informatica S.p.A. agency, in particular the technique used to impute the missing data. The modeling of multiple outcomes from data collected in STEMI Archive was previously discussed in Ieva et al. (2014), under a semiparametric fre- quentist bivariate probit model. Their aim was to analyze the relationship among in-hospital mortality and a treatment effectiveness outcome in the presence of con- founders, that is, variables that are associated with both covariates and response. This is a problem that poses serious limitations to covariate adjustment since the use of classical techniques may yield biased and inconsistent estimates. In this context, Ieva et al. (2014) proposed the use of a semiparametric recursive bivariate probit model, as an effective way to estimate the effect that a binary regressor has on a binary outcome in the presence of nonlinear confounder response relationships. In contrast, we focus on a joint model for the grouped outcomes. As discussed below, our aim is to find relevant groups of hospitals in terms of patient-specific characteristics, which may assist in further planning and policy making.
Mostrar más

25 Lee mas

Bayesian analysis of textual data

Bayesian analysis of textual data

In order to verify whether that is the case, the comparison will be based both on word length distribution, as well as on the frequency with which the twenty most frequent function words are used in these sentences. Before counting the number of l-lettered words and the number of times function words appear in the sentences, we have excluded from the text all citations, acronyms, capital lettered words, numbers, dates and names of persons and of cities. On top of that, we have only considered the factual, the legal basis and the final verdict, excluding from the analysis the formal paragraphs that are always repeated at the end of all sentences. These twenty most frequent function words are: de, la, que, el, en, y, a, los, se, por, del, las, no, una, con, es, o, para, su y al. Note that, different from what happens in the authorship attribution problem case, with S > 1, in the authorship verification case, with S = 1, one can not choose the list of words or features based on their discriminating power, because one only has a single candidate author. This is, in fact, the only feature that distinguishes verification studies from attribution studies, other than the number of candidate authors involved.
Mostrar más

195 Lee mas

Real-time model predictive control for quadrotors

Real-time model predictive control for quadrotors

by each rotor, thus there is a maximum velocity v max and Ω max . From v max , F max can be determined. The constraint on the rate of the thrust i.e. W max implies that ˙ F max exists and is a function of the vehicle attitude (R) linked by Equation 17. Hence through Equation 19, the instantaneous constraints on the control input u can be obtained. Therefore, the state and input constraints are dependent on R, T and v of the vehicle. To the best knowledge of the authors, there is no published work on the solution to the state and input constraints MPC problem that are state dependent. However, a solution to the linearisation of nonlinear MPC with state dependent input constraints has been proposed in (Simon et al. (2013); Deng et al. (2009)). This paper uses unconstrained inputs and states in order to have real- time solution. In addition, these input constraints are dependent on the maximum voltage of the battery through Equation 10 that is two levels down in the hierarchy. This enables the complete removal of the constraints on the MPC control input u and state x thereby transforming the problem into an unconstrained MPC problem.
Mostrar más

8 Lee mas

TítuloObject oriented  tool to model and simulate disvrete event systems

TítuloObject oriented tool to model and simulate disvrete event systems

This paper illustrates the methodology to develop a real time object oriented tool (RTOOT) destined to design and implement SFC based applications by means of an object oriented virtual engineering environment, the HP-VEE. Arrangement of proposed RTOOT consists in developing a library which comprises a set of user objects capable for implementing the mathematical model of sequential function charts on the basis of Petri Nets, being IEC 1131-3 standard language compliance [4]

6 Lee mas

A data model for Cultural Heritage within INSPIRE.

A data model for Cultural Heritage within INSPIRE.

relevance of the spatial component within the heritage elements to the forefront. When dea- ling with protected natural sites, the value is typically embedded in the place itself that is protected. Characteristics that make a place naturally valuable are inherently attached to their geographical location, and cannot be set apart from each other. A hill, a lagoon or a marshy area, are all natural features that can- not be protected separately from the place where they are located. Rather than being loca- ted somewhere, natural places are best descri- bed as locations in themselves. This could also be the case with cultural entities. Indeed, the relevance of location and place in the characte- rization of cultural features has long been claimed in such disciplines as Cultural Geo- graphy (for instance Claval 1995), Anthropo- logy (see Tuan 1974; or Ingold 2000) or Landscape Archaeology (for instance David and Thomas 2008) to name but a few. However, in Heritage Management, locations have been traditionally disregarded as merely contextual, and even circumstantial, attributes of objects and features. When describing cultural featu- res, such as buildings or sites, heritage experts tend to focus on the formal characteristics, with the spatial dimension constituting just another attribute, rather than a property. Cu- rrently, in terms of preservation, heritage ele- ments are sometimes preserved by removing them from their places of origin. Dealing with the two proposed categories of spatial objects (legal and cultural), allows us to make the dif- ferent nature of both aspects explicit and to make clear the need for a different process of reasoning in their creation.
Mostrar más

82 Lee mas

Dynamic Model for RNA-seq Data Analysis

Dynamic Model for RNA-seq Data Analysis

Correspondence should be addressed to Momiao Xiong; momiao.xiong@uth.tmc.edu Received 4 December 2014; Accepted 16 February 2015 Academic Editor: Ernesto Picardi Copyright © 2015 L. Li and M. Xiong. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. By measuring messenger RNA levels for all genes in a sample, RNA-seq provides an attractive option to characterize the global changes in transcription. RNA-seq is becoming the widely used platform for gene expression profiling. However, real transcription signals in the RNA-seq data are confounded with measurement and sequencing errors and other random biological/technical variation. To extract biologically useful transcription process from the RNA-seq data, we propose to use the second ODE for modeling the RNA-seq data. We use differential principal analysis to develop statistical methods for estimation of location-varying coefficients of the ODE. We validate the accuracy of the ODE model to fit the RNA-seq data by prediction analysis and 5-fold cross validation. To further evaluate the performance of the ODE model for RNA-seq data analysis, we used the location-varying coefficients of the second ODE as features to classify the normal and tumor cells. We demonstrate that even using the ODE model for single gene we can achieve high classification accuracy. We also conduct response analysis to investigate how the transcription process responds to the perturbation of the external signals and identify dozens of genes that are related to cancer.
Mostrar más

14 Lee mas

Robust Bayesian model selection

Robust Bayesian model selection

To the best of our knowledge, the …rst systematic study of the impact of model speci- …cation on the quality of the standard Bayesian inferential technique is in Müller (2013) where the author advocates using an arti…cial posterior distribution, in particular, a nor- mal distribution with MLE as its mean but with a sandwich estimate as its covariance. He then showed that when the model is misspeci…ed, Bayesian inference relying on the new posterior achieves a lower asymptotic frequentist risk than the posterior distribution corresponding to the misspeci…ed model. This result points out an important observa- tion that the traditional Bayesian inferential technique is suboptimal when the model is misspeci…ed.
Mostrar más

28 Lee mas

An Objective Bayesian Criterion to Determine Model Prior Probabilities

An Objective Bayesian Criterion to Determine Model Prior Probabilities

We discuss the problem of selecting among alternative parametric models within the Bayesian framework. For model selection problems which involve non-nested models, the common ob- jective choice of a prior on the model space is the uniform distribution. The same applies to situations where the models are nested. It is our contention that assigning equal prior probabil- ity to each model is over simplistic. Consequently, we introduce a novel approach to objectively determine model prior probabilities conditionally on the choice of priors for the parameters of the models. The idea is based on the notion of the worth of having each model within the selection process. At the heart of the procedure is the measure of this worth using the Kullback–Leibler divergence between densities from different models.
Mostrar más

28 Lee mas

Bayesian meta-analysis models for heterogeneous genomics data

Bayesian meta-analysis models for heterogeneous genomics data

cell proliferation. On the other hand, the mechanism of epigenetic alterations is more complicated. For example, DNA methylation patterns are globally disrupted in tumor cells. The cancer methylome is characterized by both global hypomethy- lation and region-specific hypermethylation at CpG islands. Hypomethylation may contribute to carcinogenesis via transcriptional activation of tumor-promoting genes (Wu et al., 2005), while hypermethylation at CpG islands is associated with si- lencing genes involved in growth regulation, cell cycle control, apoptosis and tumor suppression. It is even noted that hypermethylation is more likely prominent in transcriptional silencing and down-regulating pathways involved in drug resistance (chemoresistance) (Li et al., 2009). Therefore, genetic mutations and chromosomal aberrations are the central characteristics of tumor cells (Pe’er and Hacohen, 2011). In recent years, the emergence of large-scale copy number assays and methyla- tion platforms enables the possibility of tracing phenotypic differences back to their genetic/epigenetic source. However, only a few genetic mutations or epigenetic al- terations provide a persistent fitness advantage across multiple tumors. Such a rare event could leave a ’genomic footprint’ in the form of a gene expression signature (Akavia et al., 2010). Therefore, it becomes increasingly important to distinguish genetic/epigenetic changes that alter mRNA transcription, and thus promote can- cer progression (driver mutation) from those with no selective advantage (passenger mutation) (Pe’er and Hacohen, 2011; Akavia et al., 2010).
Mostrar más

137 Lee mas

A Parallel Grid based Implementation for Real Time Processing of Event Log Data in Collaborative Applications

A Parallel Grid based Implementation for Real Time Processing of Event Log Data in Collaborative Applications

scale services. It is, at the time of this writing, composed of 1069 nodes hosted in 494 different sites. Each Planetlab node is an Intel IA32 machine that must comply with minimum hardware requirements (i.e., 1 GHz PIII + 1 Gb RAM) running the same base software, basically a modified Linux operating system offering services to create virtual isolated partitions in the node, called slivers, which look to users as the real machine. Planetlab allows every user to dynamically create up to one sliver in every node, the set of slivers assigned to a user form what is called a slice. It is said that a Planetlab node can run up to 100 concurrent slivers. To test our Grid prototype, we turned Planetlab into a Grid by installing the GT3’s Grid service container in every sliver of our slice. Moreover, we implemented the worker as a simple Grid service playing the role of the parser and outputter components and deployed it on the GT3’s container of every sliver of our slice. On the other hand, we wrote a simple Java client playing the role of the master and mapping to the sensor and extractor components, which dispatches, using a simple list scheduling strategy, the tasks to the workers by calling the operations exposed by the worker Grid services.
Mostrar más

17 Lee mas

A Statistical Model for Multiparty Electoral Data

A Statistical Model for Multiparty Electoral Data

Accessed February 18, 2015 1:11:09 PM EST Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:3992146 Terms of Use This article was downloaded from Harvard University's DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at

19 Lee mas

A Bayesian Decision Model for Intelligent Routing in Sensor Networks

A Bayesian Decision Model for Intelligent Routing in Sensor Networks

Since nodes in model 1 do not know which of their respective neighbors are closer to the sink, the decisions at node i in this model will be based on the energy estimation at any of the [r]

5 Lee mas

Dynamical Modeling Techniques for Biological Time Series Data

Dynamical Modeling Techniques for Biological Time Series Data

On the other hand, this thesis covers the characterization of dynamically differen- tiable brain states in Zebrafish in the context of epilepsy and epileptogenesis. Zebrafish larvae represent a valuable animal model for the study of epilepsy due to both their genetic and dynamical resemblance with humans [9, 10]. The fundamental premise of this research is the early apparition of subtle functional changes preceding the clinical symptoms of seizures. More generally, this idea, based on bifurcation theory, can be described by a progressive loss of resilience of the brain and ultimately, its transition from a healthy state to another characterizing the disease [11]. First, the morphological signatures of seizures generated by distinct pathological mechanisms are investigated. For this purpose, a range of mathematical biomarkers that characterizes relevant dynamical aspects of the neurophysiological signals are considered. Such mathematical markers are later used to address the subtle manifestations of early epileptogenic activity. Fi- nally, the feasibility of a probabilistic prediction model that indicates the susceptibility of seizure emergence over time is investigated. The existence of alternative stable sys- tem states and their sudden and dramatic changes have notably been observed in a wide range of complex systems such as in ecosystems, climate or financial markets [12, 13, 14]. Overall, the frameworks of systems identification theory, systems control theory, (non)linear time series analysis, dynamical bifurcation theory and machine learning constitute the foundations upon which both the reconstruction of gene regulatory networks and the investigation of brain vulnerability to epileptic seizure are addressed. Hereafter, the background underlying these two problematics is introduced.
Mostrar más

214 Lee mas

Exact Bayesian inference via data augmentation

Exact Bayesian inference via data augmentation

We obtain the exact posterior distribution for the param- eters of the INAR(p) model (p = 0, 1, 2, 3). (The INAR(0) corresponds to independent and identically distributed Pois- son data.) We also computed the marginal log-likelihood (evidence) for the models for comparison. The results are presented in Table 2 . There is clear evidence from the marginal log-likelihood for p = 2. Applying the BIC-based penalisation prior used in Enciso-Mora et al. ( 2009a ), where for p = 0, 1, 2, 3 the prior on INAR(p) was set proportional to n −p/2 gives posterior probabilities of 0.0030, 0.7494 and 0.2476, for the INAR(1), INAR(2) and INAR(3) models, re- spectively. The total number of categories grows rapidly with the order p of the model, and to compute {π(y|x)} us- ing Fortran95 for the INAR(1), INAR(2) and INAR(3) mod- els, took less than a second, 8 seconds and 45 minutes, re- spectively. It should be noted that the INAR(3) model was at the limits of what is computationally feasible requiring over 1500 MB of computer memory. The memory limitation is due to the total number of categories, and for smaller data sets, either in terms of n, or the magnitude of x t ’s, it would
Mostrar más

15 Lee mas

Input data distribution estimations for a pricing model to optimize the duration of promotion periods for airlines

Input data distribution estimations for a pricing model to optimize the duration of promotion periods for airlines

Basically, this table says if with j seats available in day i before departure date it is optimal to do promotion (represented with a 1) or not to do promotion (represented with a zero). For example, if we are on 8 days before the departure day and we have 24 seats available we must do promotion until there are 12 seats available. It is important that if the 12 seats are not sold that day, the policy could change next day. Continuing with the example, if only 2 seats are sold in day -8, then in day -7 it is not optimal to do promotion until there are 20 seats available. So the promotional policy varies depending on the number of days before departure and the number of seats available. Specifically in this result we can see that the promotion is done in the days closer to the departure date and when having more than ten seats available. This has sense since we want to maximize the number of seats sold. If in the days closer to departure we have a lot of chairs, something is need to be done. In the other hand, promotion starts when having more than 10 seats available and this can be because of the arrival rate that its mean is closer to ten. So we have in average that 10 people are going to buy at a regular fare.
Mostrar más

14 Lee mas

A time to learn, a time to teach

A time to learn, a time to teach

A global high prevalence of short sleeping time, a slight increase of sleep time in adolescents with UBN, and different patterns of wake activities that predict sleep deficit, depending on the presence of UBN, were found. The poor academic achievement, increased risk of accidents and adverse health outcomes associated with sleep deprivation support the view that sleep is an additional unsatisfied basic need that worsens living conditions at this age. The results may help to design public health policies that contribute to ameliorate this adverse situation. In our study the presence of UBN increased rise time and sleep duration. This contradicts previous literature, where a lower socioeconomic background is usually associated with less sleep duration and more sleep disruption in adolescents. Socioeconomic demographics like income, educational level, and employment status are usually associated with more delayed, shorter duration, and less consistent sleep patterns [20]. However, none of these studies focused in situations of extreme poverty. Among the factors associated with the presence of UBN that may justify these findings, the assistance to nearby schools probably explain the increased rise time and indirectly the increased sleep time. The association between UBN and attendance to neighborhood schools is as expected, since better schools tends to be available for families with higher socio–economic status through residential mobility and enrolment in private schools. Another factor that could explain the increased rise time is the observed lower percentage of children that assist to extra–curricular intellectual or physical activities. School starting time and full–day schooling were strong predictors of sleep deficit in adolescents with and without UBN. Starting school at the morning school is a well–recognized risk factor for sleep deprivation, determining less time spent in bed, worse sleep quality and increased daytime sleepiness which in turn leads to bad mood and poor performance. Unlike other school systems, in Argentina some schools have half–day schedules while others full–day schedules. It is expected that the extended day pose a higher risk of sleep deficit, because it combines an early school starting with being at school most of the day, thus preventing the possibility of taking naps or delaying bed time.
Mostrar más

7 Lee mas

Quantifying dimensionality: Bayesian cosmological model complexities

Quantifying dimensionality: Bayesian cosmological model complexities

seven-) dimensional [1] , and modern likelihoods introduce a host of nuisance parameters to combat the influence of foregrounds and systematics. For example the Planck likelihood [36] is in total 21-dimensional, the DES like- lihood [2] is 26-dimensional, and their combination 41-dimensional (Table I ). While samples from the posterior distribution represent a near lossless compression of the information present in this distribution, it goes without saying that visualizing a 40-dimensional object is challenging. Triangle and corner plots [37] represent marginalized views of this information and can hide hidden correlations and constraints between three or more parameters. The fear is that one could misdiagnose a dataset that has powerful constraints if Fig. 1 occurred in higher dimensions. It would be helpful if there were a number d similar to the Kullback-Leibler divergence D which quan- tifies the effective number of constrained parameters.
Mostrar más

14 Lee mas

A Financial Data Mining Model for Extracting Customer Behavior

A Financial Data Mining Model for Extracting Customer Behavior

In order to have better market analysis and customer relationship management, the utilization of Information Technology (IT) is a significant tool used to help the company. It helps business organizations to enhance competence and sustain continuous growth of company business (Chung et al., 2009). According to Kwan et al. (2005), technology can support knowledge management in business including data warehousing, data mining, the Internet and document management systems. In recent decades, data mining has been applied to a broad range of topics and areas (Hosseini et al., 2009; Ting et al., 2009; Ngai et al., 2008; Kirkos et al., 2006). Most business organizations use it to find the problems area and allow managers to make strategic decisions that will allow the business organizations to succeed. It is true that IT plays a significant role that helps business organizations to improve their performance. In order to increase understanding of the reasoning buying patterns of customers, many companies use automated tools to study the behavior of their customers. Once relevant information has been obtained, it can be used in a way that will allow the organization to predict the behavior of their clients. With the advent of the rapid development of information technology, the biggest challenge is not only getting important information that accumulates daily in databases, but also searching through such a huge database to find relevant connections. However, patterns among the data are not easy to extract. The reason for this is that the information must be specific and refined. To successfully apply data mining on the information obtained, a company must be able to understand the connections between the business strategies and the models that are created within the data mining programs. However, many managers do not notice the importance of data and the information for data analysis. Also, most managers do not understand the relationship between the data due to the lack of technical background. For example, a financial marketing manager does not understand the relationship between the hidden patterns and the customer’s portfolio. Therefore, it is important to have a sophisticated tool to help companies find out the relationship between different data (Zhang and Zhou, 2004).
Mostrar más

14 Lee mas

Show all 10000 documents...