Feed aggregator

Views of Views

Tom Kyte - Mon, 2019-10-21 03:45
I remember hearing some time ago that creating views based upon other existing views should be avoided as it can often confuse the optimiser and result in full table scans. I expect that this is just another urban myth however I would be intereste...
Categories: DBA Blogs

Inserts with APPEND Hint.

Tom Kyte - Mon, 2019-10-21 03:45
<code>insert /*+ append */ into t select rownum,mod(rownum,5) from all_objects where rownum <=1000 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- -----...
Categories: DBA Blogs

Kill a session from database procedure

Tom Kyte - Mon, 2019-10-21 03:45
How i can kill a session from a stored database procedure. There is some way to do this?
Categories: DBA Blogs

Solr Sharding – Concepts & Methods

Yann Neuhaus - Sun, 2019-10-20 03:06

A few weeks ago, I published a series of blog on the Alfresco Clustering, including Solr Sharding. At that time, I planned to first explain what is really the Solr Sharding, what are the different concepts and methods around it. Unfortunately, I didn’t get the time to write this blog so I had to post the one related to Solr even before explaining the basics. Today, I’m here to rights my wrong! Obviously, this blog has a focus on Alfresco related Solr Sharding since that’s what I do.

I. Solr Sharding – Concepts

The Sharding in general is the partitioning of a set of data in a specific way. There are several possibilities to do that, depending on the technology you are working on. In the scope of Solr, the Sharding is therefore the split of the Solr index into several smaller indices. You might be interested in the Solr Sharding because it improves the following points:

  • Fault Tolerance: with a single index, if you lose it, then… you lost it. If the index is split into several indices, then even if you are losing one part, you will still have all others that will continue working
  • High Availability: it provides more granularity than the single index. You might want for example to have a few small indices without HA and then have some others with HA because you configured them to contain some really important nodes of your repository
  • Automatic Failover: Alfresco knows automatically (with Dynamic Registration) which Shards are up-to-date and which ones are lagging behind so it will choose automatically the best Shards to handle the search queries so that you get the best results possible. In combination with the Fault Tolerance above, this gives the best possible HA solution with the less possible resources
  • Performance improvements: better performance in indexing since you will have several Shards indexing the same repository so you can have less work done by each Shards for example (depends on Sharding Method). Better performance in searches since the search query will be processes by all Shards in parallel on smaller parts of the index instead of being one single query on the full index

Based on benchmarks, Alfresco considers that a Solr Shard can contain up to 50 to 80 000 000 nodes. This is obviously not a hard limit, you can have a single Shard with 200 000 000 nodes but it is more of a best practice if you want to keep a fast and reliable index. With older versions of Alfresco (before the version 5.1), you couldn’t create Shards because Alfresco didn’t support it. So, at that time, there were no other solutions than having a single big index.

There is one additional thing that must be understood here: the 50 000 000 nodes soft limit is 50M nodes in the index, not in the repository. Let’s assume that you are using a DB_ID_RANGE method (see below for the explanation) with an assumed split of 65% live nodes, 20% archived nodes, 15% others (not indexed: renditions, other stores, …). So, if we are talking about the “workspace://SpacesStore” nodes (live ones), then if we want to fill a Shard with 50M nodes, we will have to use a DB_ID_RANGE of 100*50M/65 = 77M. Basically, the Shard should be more or less “full” once there are 77M IDs in the Database. For the “archive://SpacesStore” nodes (archived ones), it would be 100*50M/20 = 250M.

Alright so what are the main concepts in the Solr Sharding? There are several terms that need to be understood:

  • Node: It’s a Solr Server (a Solr installed using the Alfresco Search Services). Below, I will use “Solr Server” instead because I already use “nodes” (lowercase) for the Alfresco Documents so using “Node” (uppercase) for the Solr Server, it might be a little bit confusing…
  • Cluster: It’s a set of Solr Servers all working together to index the same repository
  • Shard: A part of the index. In other words, it’s a representation (virtual concept) of the index composed of a certain set of nodes (Alfresco Documents)
  • Shard Instance: It’s one Instance of a specific Shard. A Shard is like a virtual concept while the Instance is the implementation of that virtual concept for that piece of the index. Several Shard Instances of the same Shard will therefore contain the same set of Alfresco nodes
  • Shard Group: It’s a collection of Shards (several indices) that forms a complete index. Shards are part of the same index (same Shard Group) if they:
    • Track the same store (E.g.: workspace://SpacesStore)
    • Use the same template (E.g.: rerank)
    • Have the same number of Shards max (“numShards“)
    • Use the same configuration (Sharding methods, Solr settings, …)

Shard is often (wrongly) used in place of Shard Instance which might lead to some confusion… When you are reading “Shard”, sometimes it means the Shard itself (the virtual concept), sometimes it’s all its Shard Instances. This is these concepts can look like:
Solr Sharding - Concepts

II. Solr Sharding – Methods

Alfresco supports several methods for the Solr Sharding and they all have different attributes and different ways of working:

  • MOD_ACL_ID (ACL v1): Alfresco nodes and ACLs are grouped by their ACL ID and stored together in the same Shard. Different ACL IDs will be assigned randomly to different Shards (depending on the number of Shards you defined). Each Alfresco node using a specific ACL ID will be stored in the Shard already containing this ACL ID. This simplifies the search requests from Solr since ACLs and nodes are together, so permission checking is simple. If you have a lot of documents using the same ACL, then the distribution will not be even between Shards. Parameters:
    • shard.method=MOD_ACL_ID
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • ACL_ID (ACL v2): This is the same as the MOD_ACL_ID, the only difference is that it changes the method to assign to ACL to the Shards so it is more evenly distributed but if you still have a lot of documents using the same ACL then you still have the same issue. Parameters:
    • shard.method=ACL_ID
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • DB_ID: This is the default Sharding Method in Solr 6 which will evenly distribute the nodes in the different Shards based on their DB ID (“alf_node.id“). The ACLs are replicated on each of the Shards so that Solr is able to perform the permission checking. If you have a lot of ACLs, then this will obviously make the Shards a little bit bigger, but this is usually insignificant. Parameters:
    • shard.method=DB_ID
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • DB_ID_RANGE: Pretty much the same thing as the DB_ID but instead of looking into each DB ID one by one, it will just dispatch the DB IDs from the same range into the same Shard. The ranges are predefined at the Shard Instance creation and you cannot change them later, but this is also the only Sharding Method that allows you to add new Shards dynamically (auto-scaling) without the need to perform a full reindex. The lower value of the range is included and the upper value is excluded (for Math lovers: [begin-end[ ;)). Since DB IDs are incremental (increase over time), performing a search query with a date filter might end-up as simple as checking inside a single Shard. Parameters:
    • shard.method=DB_ID_RANGE
    • shard.range=<begin-end>
    • shard.instance=<shard.instance>
  • DATE: Months will be assigned to a specific Shard sequentially and then nodes are indexed into the Shard that was assigned the current month. Therefore, if you have 2 Shards, each one will contain 6 months (Shard 1 = Months 1,3,5,7,9,11 // Shard 2 = Months 2,4,6,8,10,12). It is possible to assign consecutive months to the same Shard using the “shard.date.grouping” parameter which defines how many months should be grouped together (a semester for example). If there is no date on a node, the fallback method is to use DB_ID instead. Parameters:
    • shard.method=DATE
    • shard.key=exif:dateTimeOriginal
    • shard.date.grouping=<1-12>
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • PROPERTY: A property is specified as the base for the Shard assignment. The first time that a node is indexed with a new value for this property, the node will be assigned randomly to a Shard. Each node coming in with the same value for this property will be assigned to the same Shard. Valid properties are either d:text (single line text), d:date (date only) or d:datetime (date+time). It is possible to use only a part of the property’s value using “shard.regex” (To keep only the first 4 digit of a date for example: shard.regex=^\d{4}). If this property doesn’t exist on a node or if the regex doesn’t match (if any is specified), the fallback method is to use DB_ID instead. Parameters:
    • shard.method=PROPERTY
    • shard.key=cm:creator
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • EXPLICIT_ID: Pretty much similar to the PROPERTY but instead of using the value of a “random” property, this method requires a specific property (d:text) to define explicitly on which Shard the node should be indexed. Therefore, this will require an update of the Data Model to have one property dedicated to the assignment of a node to a Shard. In case you are using several types of documents, then you will potentially want to do that for all. If this property doesn’t exist on a node or if an invalid Shard number is given, the fallback method is to use DB_ID instead. Parameters:
    • shard.method=EXPLICIT_ID
    • shard.key=<property> (E.g.: cm:targetShardInstance)
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>

As you can see above, each Sharding Method has its own set of properties. You can define these properties in:

  • The template’s solrcore.properties file in which case it will apply to all Shard Instance creations
    • E.g.: $SOLR_HOME/solrhome/templates/rerank/conf/solrcore.properties
  • The URL/Command used to create the Shard Instance in which case it will only apply to the current Shard Instance creation
    • E.g.: curl -v “http://host:port/solr/admin/cores?action=newCore&…&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0

Summary of the benefits of each method:
Solr Sharding - Benefits

First supported versions for the Solr Sharding in Alfresco:
Solr Sharding - Availability

Hopefully, this is a good first look into the Solr Sharding. In a future blog, I will talk about the creation process and show some example of what is possible. If you want to read more on the subject, don’t hesitate to take a look at the Alfresco documentation, it doesn’t explain everything, but it is still a very good starting point.

Cet article Solr Sharding – Concepts & Methods est apparu en premier sur Blog dbi services.

Why cost for TABLE ACCESS BY INDEX ROWID to high for only one row

Tom Kyte - Sat, 2019-10-19 15:45
Dear Tom, I have problem with query on table have function base index. create index : <code> create index customer_idx_idno on Customer (lower(id_no)) ; --- id_no varchar2(40) </code> <b>Query 1:</b> execute time 0.031s but cost 5,149, 1 row ...
Categories: DBA Blogs

How to get unique values/blanks across all columns

Tom Kyte - Sat, 2019-10-19 15:45
Hi, I have a wide table with 200 odd columns. Requirement is to pivot the columns and display the unique values and count of blanks within each column <code>CREATE TABLE example( c1 VARCHAR(10), c2 VARCHAR(10), c3 VARCHAR(10) ); / INSERT ...
Categories: DBA Blogs

Elastic search using Oracle 19c

Tom Kyte - Sat, 2019-10-19 15:45
Team, Very recently we got a question from our customer that "can I replace Elastic search using Oracle 19c or any version of Oracle database prior to that"? Any inputs/directions to that please - kindly advice.
Categories: DBA Blogs

Add NULLABLE column with DEFAULT values to a large table

Tom Kyte - Sat, 2019-10-19 15:45
I'm adding a number of columns with DEFAULT values that are NULLABLE to a large table e.g <code> alter table big_table add (col1 varchar2(1) default 0, col2 varchar2(1) default 0); </code> It's taking a long time to do because Oracle is e...
Categories: DBA Blogs

Create table from select with changes in the column values

Tom Kyte - Sat, 2019-10-19 15:45
Hello, In the work we have an update script that takes around 20 hours, and some of the most demanding queries are updates where we change some values, something like: <code> UPDATE table1 SET column1 = DECODE(table1.column1,null,null,'no...
Categories: DBA Blogs

Data archival and purging for OLTP database.

Tom Kyte - Sat, 2019-10-19 15:45
Hi Tom, Need your suggestion regarding data archival and purging solution for OLTP db. Currently we are planning to have below approach. database is size is 150 Gb and planning to run the jobs monthly. 1) Generate flat files from table based on...
Categories: DBA Blogs

Job to end in case connection not establishing with utl_http.begin_request

Tom Kyte - Sat, 2019-10-19 15:45
i am tracking around 17000 orders through a web service through pl/sql to a destination server. i am running multiple jobs in batches(for 500 orders each job) for invoking webservice to get the order status. so around 34 jobs are running (17000/500) ...
Categories: DBA Blogs

Inserting values into a table with '&'

Tom Kyte - Sat, 2019-10-19 15:45
Hi, I want to insert a values into a table as follows: create table test (name varchar2(35)); insert into test values ('&Vivek'); I tried the escape character '\' but the system asks for a value of the substitution variable. I also did a...
Categories: DBA Blogs

SELECT ANY DICTIONARY - What Privileges Does it Have - SELECT_CATALOG_ROLE

Pete Finnigan - Sat, 2019-10-19 15:45
There has been a few blog posts over the years discussing what is the difference between SELECT ANY DICTIONARY and the SELECT_CATALOG_ROLE. Hemant posted in 2014 about the difference between SELECT ANY DICTIONARY and SELECT_CATALOG_ROLE . This post was a....[Read More]

Posted by Pete On 11/10/19 At 01:59 PM

Categories: Security Blogs

What Privileges Can you Grant On PL/SQL?

Pete Finnigan - Sat, 2019-10-19 15:45
Oracle has a lot of privileges and models; privileges can be granted to users, roles and also since 12c roles can be granted to PL/SQL code (I will not discuss this aspect here as i will bog separately about grants....[Read More]

Posted by Pete On 08/10/19 At 01:43 PM

Categories: Security Blogs

ORA-01950 Error on a Sequence - Error on Primary Key Index

Pete Finnigan - Sat, 2019-10-19 15:45
I posted yesterday a blog about an error on a sequence of ORA-01950 on tablespace USERS - ORA-01950 Error on a Sequence . This was attributed to the sequence by me because that's where the error in Oracle was pointing....[Read More]

Posted by Pete On 01/10/19 At 01:12 PM

Categories: Security Blogs

ORA-01950 Error on a Sequence

Pete Finnigan - Sat, 2019-10-19 15:45
UPDATE: I have updated information for this post and rather than make this one much longer i created a new post - please see ORA-01950 Error on a Sequence - Error on Primary Key Index Wow, its been a while....[Read More]

Posted by Pete On 30/09/19 At 01:42 PM

Categories: Security Blogs

Machine Learning with SQL

Andrejus Baranovski - Sat, 2019-10-19 10:25
Python (and soon JavaScript with TensorFlow.js) is a dominant language for Machine Learning. What about SQL? There is a way to build/run Machine Learning models in SQL. There could be a benefit to run model training close to the database, where data stays. With SQL we can leverage strong data analysis out of the box and run algorithms without fetching data to the outside world (which could be an expensive operation in terms of performance, especially with large datasets). This post is to describe how to do Machine Learning in the database with SQL.



Read more in my Towards Data Science post.

Basic Replication -- 8 : REFRESH_MODE ON COMMIT

Hemant K Chitale - Sat, 2019-10-19 09:26
So far, in previous posts in this series, I have demonstrated Materialized Views that set to REFRESH ON DEMAND.

You can also define a Materialized View that is set to REFRESH ON COMMIT -- i.e. every time DML against the Source Table is committed, the MV is also immediately updated.  Such an MV must be in the same database  (you cannot define an ON COMMIT Refresh across two databases  -- to do so, you have to build your own replication code, possibly using Database Triggers or external methods of 2-phase commit).

Here is a quick demonstration, starting with a Source Table in the HEMANT schema and then building a FAST REFRESH MV in the HR schema.

SQL> show user
USER is "HEMANT"
SQL> create table hemant_source_tbl (id_col number not null primary key, data_col varchar2(30));

Table created.

SQL> grant select on hemant_source_tbl to hr;

Grant succeeded.

SQL> create materialized view log on hemant_source_tbl;

Materialized view log created.

SQL> grant select on mlog$_hemant_source_tbl to hr;

Grant succeeded.

SQL>
SQL> grant create materialized view to hr;

Grant succeeded.

SQL> grant on commit refresh on hemant_source_tbl to hr;

Grant succeeded.

SQL>
SQL> grant on commit refresh on mlog$_hemant_source_tbl to hr;

Grant succeeded.

SQL>


Note : I had to grant the CREATE MATERIALIZED VIEW privilege to HR for this test case. Also, as the MV is to Refresh ON COMMIT, two additional object-level grants on the Source Table and the Materialized View Log are required as the Refresh is across schemas.

SQL> connect hr/HR@orclpdb1
Connected.
SQL> create materialized view hr_mv_on_commit
2 refresh fast on commit
3 as select id_col as primary_key_col, data_col as value_column
4 from hemant.hemant_source_tbl;

Materialized view created.

SQL>


Now that the Materialized View is created successfully, I will test DML against the table and check that an explicit REFRESH call (e.g. DBMS_MVIEW.REFRESH or DBMS_REFRESH.REFRESH) is not required.

SQL> connect hemant/hemant@orclpdb1
Connected.
SQL> insert into hemant_source_tbl values (1,'First');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from hr.hr_mv_on_commit;

PRIMARY_KEY_COL VALUE_COLUMN
--------------- ------------------------------
1 First

SQL> connect hr/HR@orclpdb1
Connected.
SQL> select * from hr_mv_on_commit;

PRIMARY_KEY_COL VALUE_COLUMN
--------------- ------------------------------
1 First

SQL>


The Materialized View in the HR schema was refreshed immediately, without an explicit REFRESH call.

Remember : An MV that is to REFRESH ON COMMIT must be in the same database as the Source Table.




Categories: DBA Blogs

pgconf.eu – Welcome to the community

Yann Neuhaus - Fri, 2019-10-18 15:41

On tuesday I started my journey to Milan to attend my first pgconf.eu, which was also my first big conference. I was really excited what will come up to me. How will it be, to become a visible part of the community. How will it be, to give my first presentation in front of so many people?

The conference started with the welcome and opening session. It took place in a huge room, to give all of the participants a seat. Really amazing, how big this community is and it is still growing. So many people from all over the world (Japan, USA, Chile, Canada….) attending this conference.

And suddenly I realized, this is the room, where I have to give my session. Some really strange feelings came up. This is my first presentation at a conference, this is the main stage, there is space for so many people! And I really hoped, they will make it smaller for me. But there was something else: Anticipation.

But first I want to give you some impressions from my time at the pgconf. Amazing to talk to one of the main developers of Patroni. I was really nervous when I just went to him and said: “Hi, may I ask you a question?” For sure he didn’t say NO. Even all the other ladies and gentlemen I met (the list is quite long), they all are so nice and all of them really open minded (is this because they all work with an open source database?). And of course a special thanks to Pavel Golub for the great picture. Find it in Daniel’s blog
Beside meeting all that great people, I enjoyed some really informative and cool sessions.


Although I still hoped they are going to make the room smaller for my presentation, of course they didn’t do it. So I had only one chance:

And I did it and afterwards I book it under “good experience”. A huge room is not so much different than a small one.

As I am back home now, I want to say: Thanks pgconf.eu and dbi services for giving me this opportunity and thanks to the community for this warm welcome.

Cet article pgconf.eu – Welcome to the community est apparu en premier sur Blog dbi services.

CBO Oddities – 1

Jonathan Lewis - Fri, 2019-10-18 12:10

I’ve decided to do a little rewriting and collating so that I can catalogue related ideas in an order that makes for a better narrative. So this is the first in a series of notes designed to help you understand why the optimizer has made a particular choice and why that choice is (from your perspective) a bad one, and what you can do either to help the optimizer find a better plan, or subvert the optimizer and force a better plan.

If you’re wondering why I choose to differentiate between “help the optimizer” and “subvert the optimizer” consider the following examples.

  • A query is joining two tables in the wrong order with a hash join when you know that a nested loop join in the opposite order would far better because you know that the data you want is very nicely clustered and there’s a really good index that would make access to that data very efficient. You check the table preferences and discover that the table_cached_blocks preference (see end notes) is at its default value of 1, so you set it to 16 and gather fresh stats on the indexes on the table. Oracle now recognises the effectiveness of this index and changes plan accordingly.
  • The optimizer has done a surprising transformation of a query, aggregating a table before joining to a couple of other tables when you were expecting it to use the joins to eliminate a huge fraction of the data before aggregating it.  After a little investigation you find that setting hidden parameter _optimizer_distinct_placement to false stops this happening.

You may find the distinction unnecessarily fussy, but I’d call the first example “helping the optimzier” – it gives the optimizer some truthful information about your data that is potentially going to result in better decisions in many different statements – and the second example “subverting the optimizer” – you’ve brute-forced it into not taking a path you didn’t like but at the same time you may have stopped that feature from appearing in other ways or in other queries. Of course, you might have minimised the impact of setting the parameter by using the opt_param() hint to apply the restriction to just this one query, nevertheless it’s possible that there is a better plan for the query that would have used the feature at some other point in the query if you’d managed to do something to help the optimizer rather than constraining it.

What’s up with the Optimizer

It’s likely that most of the articles will be based around interpreting execution plans since those are the things that tell us what the optimizer thinks will happen when it executes a statement, and within execution plans there are three critical aspects to consider –

  1. the numbers (most particularly Cost and Rows),
  2. the shape of the plan,
  3. the Predicate Information.

I want to use this note to make a couple of points about just the first of the three.

  • First – the estimates on any one line of an execution plan are “per start” of the line; some lines of an execution plan will be called many times in the course of a statement. In many cases the Rows estimate from one line of a plan will dictate the number of times that some other line of the plan will be executed – so a bad estimate of “how much data” can double up as a bad estimate of “how many times”, leading to a plan that looks efficient on paper but does far too much work at run-time. A line in a plan that looks a little inefficient may be fine if it executes only one, a line that looks very efficient may be a disaster if it executes a million time. Being able to read a plan and spot the places where the optimizer has produced a poor estimate of Rows is a critical skill – and there are many reasons why the optimizer produces poor estimates. Being able to spot poor estimates depends fairly heavily on knowing the data, but if you know the generic reasons for the optimizer producing poor estimates you’ve got a head start for recognising and addressing the errors when they appear.
  • Second – Cost is synonymous with Time. For a given instance at a given moment there is a simple, linear, relationship between the figure that the optimizer reports for the Cost of a statement (or subsection of a statement) and the Time that the optimizer reports. For many systems (those that have not run the calibrate_io procedure) the Time is simply the Cost multiplied by the time the optimizer thinks it will take to satisfy a single block read request, and the Cost is the optimizer’s estimate of the I/O requirement to satisfy the statement – with a fudge factor introduced to recognise the fact that a “single block” read request ought to complete in less time than a “multiblock” read request. Generally speaking the optimizer will consider many possible plans for a statement and pick the plan with the lowest estimated cost – but there is at least one exception to this rule, and it is an unfortunate weakness in the optimizer that there are many valid reasons why its estimates of Cost/Time are poor. Of course, you will note that the values that Oracle reports for the Time column are only accurate to the second – which isn’t particularly helpful when a single block read typically operates in the range of a few milliseconds.

To a large degree the optimizer’s task boils down to:

  • What’s the volume and scatter of the data I need
  • What access paths, with what wastage, are available to get to that data
  • How much time will I spend on I/O reading (and possibly discarding) data to extract the bit I want

Of course there are other considerations like the amount of CPU needed for a sort, the potential for I/O as sorts or hash joins, the time to handle a round-trip to a remote system, and RAC variations on the basic theme. But for many statements the driving issue is that any bad estimates of “how much data” and “how much (real) I/O” will lead to bad, potentially catastrophic, choices of execution plan. In the next article I’ll list all the different reasons (that I can think of at the time) why the optimizer can produce bad estimates of volume and time.

References for Cost vs. Time

References for table_cached_blocks:

 

Pages

Subscribe to Oracle FAQ aggregator